From my understanding, there are two ways to subset a dataframe in pandas:
a) df['columns']['rows']
b) df.loc['rows', 'columns']
I was following a guided case study, where the instruction was to select the first and last n rows of a column in a dataframe. The solution used Method A, whereas I tried Method B.
My method wasn't working and I couldn't for the life of me figure out why.
I've created a simplified version of the dataframe...
male = [6, 14, 12, 13, 21, 14, 14, 14, 14, 18]
female = [9, 11, 6, 10, 11, 13, 12, 11, 9, 11]
df = pd.DataFrame({'Male': male,
'Female': female},
index = np.arange(1, 11))
df['Mean'] = df[['Male', 'Female']].mean(axis = 1).round(1)
df
Selecting the first two rows, works fine for method a and b
print('Method A: \n', df['Mean'][:2])
print('Method B: \n', df.loc[:2, 'Mean'])
Method A:
1 7.5
2 12.5
Method B:
1 7.5
2 12.5
But not for selecting the last 2 rows, it doesn't work the same. Method A returns the last two rows as it should. Method B (.loc) doesn't, it returns the whole dataframe. Why is this and how do I fix it?
print('Method A: \n', df['Mean'][-2:])
print('Method B: \n', df.loc[-2:, 'Mean'])
Method A:
9 11.5
10 14.5
Method B:
1 7.5
2 12.5
3 9.0
4 11.5
5 16.0
6 13.5
7 13.0
8 12.5
9 11.5
10 14.5