Pandas Dataframe

Posted under » Python Data Analysis on 12 June 2023

From Pandas intro.

While Series is like a column, a DataFrame is the whole table.

In simplest form

import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
}

myvar = pd.DataFrame(data)

print(myvar)

   calories  duration
0       420        50
1       380        40
2       390        45

Pandas use the loc attribute to return one or more specified row(s)

print(df.loc[0])

calories    420
duration     50
Name: 0, dtype: int64

Return 2 rows. When using [], the result is a Pandas DataFrame.

print(df.loc[[0, 1]])

     calories  duration
  0       420        50
  1       380        40

Learn how to update a dataframe.

With the index argument, you can name your own indexes.

import pandas as pd

data = {
  "calories": [420, 380, 390],
  "duration": [50, 40, 45]
 }

df = pd.DataFrame(data, index = ["day1", "day2", "day3"])

print(df) 

      calories  duration
day1       420        50
day2       380        40
day3       390        45

print(df[['calories', 'duration']])

Both print(df) and print(df[['calories', 'duration']]) will return the same output. So if you just want to see the calories, then

print(df['duration'])

day1    50
day2    40
day3    45
Name: duration, dtype: int64

If you want to save the contents to a text file or format this, then you can use a loop like a for and iterate statement

for index, row in df.iterrows():
	print(index, ': ', row['duration'], file=open('output.txt', 'a'))

day1 :  50
day2 :  39
day3 :  45

Another print example.

You can also achieve the same with for itertuples():

for row in df.itertuples():
	print(index, ': ', row.duration)

Use the named index in the loc attribute to return the specified row(s).

print(df.loc["day2"])

calories    380
duration     40
Name: day2, dtype: int64

If there are more than one "day2" rows, then all the rows with day2 will be included.

You can assign the output into another array.

first = df.loc["day2"]

second = df.loc["day3"]

print(first, "\n\n\n", second)

calories    380
duration     40
Name: day2, dtype: int64

calories    390
duration     45
Name: day3, dtype: int64

Just like iterrows above you can do the same with column. Loop or Iterate over all or certain columns of a dataframe

for column in df[['calories', 'duration']]:
	columnSeriesObj = df[column]
	print('Column Name : ', column)
	print('Column Contents : ', columnSeriesObj.values)

Column Name :  calories
Column Contents :  [420 380 390]
Column Name :  duration
Column Contents :  [50 40 45]

cont... Read data into Dataframe

web security linux ubuntu python django git Raspberry apache mysql php drupal cake javascript css AWS data