Posted under » Python Data Analysis on 12 June 2023
From Pandas intro : series. While Series is like a column, a DataFrame is a whole table. The Series object behaves similarly to a NumPy array which we will learn later.
In simplest form
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
myvar = pd.DataFrame(data)
print(myvar)
calories duration
0 420 50
1 380 40
2 390 45
Pandas use the loc attribute to return one or more specified row(s)
print(df.loc[0]) calories 420 duration 50 Name: 0, dtype: int64
Return 2 rows. When using [], the result is a Pandas DataFrame.
print(df.loc[[0, 1]])
calories duration
0 420 50
1 380 40
Learn how to update a dataframe.
With the index argument, you can name your own indexes.
import pandas as pd
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
print(df)
calories duration
day1 420 50
day2 380 40
day3 390 45
print(df[['calories', 'duration']])
Both print(df) and print(df[['calories', 'duration']]) will return the same output. So if you just want to see the calories, then
print(df['duration']) day1 50 day2 40 day3 45 Name: duration, dtype: int64
If you want to save the contents to a text file or format this, then you can use a loop like a for and iterate statement
for index, row in df.iterrows():
print(index, ': ', row['duration'], file=open('output.txt', 'a'))
day1 : 50
day2 : 39
day3 : 45
Another print example.
You can also achieve the same with for itertuples():
for row in df.itertuples(): print(index, ': ', row.duration)
Use the named index in the loc attribute to return the specified row(s).
print(df.loc["day2"]) calories 380 duration 40 Name: day2, dtype: int64
If there are more than one "day2" rows, then all the rows with day2 will be included.
You can assign the output into another array.
first = df.loc["day2"] second = df.loc["day3"] print(first, "\n\n\n", second) calories 380 duration 40 Name: day2, dtype: int64 calories 390 duration 45 Name: day3, dtype: int64
Just like iterrows above you can do the same with column. Loop or Iterate over all or certain columns of a dataframe
for column in df[['calories', 'duration']]:
columnSeriesObj = df[column]
print('Column Name : ', column)
print('Column Contents : ', columnSeriesObj.values)
Column Name : calories
Column Contents : [420 380 390]
Column Name : duration
Column Contents : [50 40 45]
cont... Read data into Dataframe