8

I am very new to pandas and trying to use groupby. I have a df with multiple columns.

  • I want to groupby a particular column and then sort each group based on a different column.
  • I want to groupby col1 and then sort each group by col5 and then do reset_index to get all rows of the dataframe.
  • I get the following error AttributeError: Cannot access callable attribute 'reset_index' of 'DataFrameGroupBy' objects, try using the 'apply' method.

My input dataframe:

col1 |  col2 | col3 | col4 | col5
=================================
A    |   A1   | A2   | A3   | DATE1
A    |   B1   | B2   | B3   | DATE2

My code:

df.sort_values(['col5'],ascending=False).groupby('col1').reset_index()
smci
  • 32,567
  • 20
  • 113
  • 146
Gingerbread
  • 1,938
  • 8
  • 22
  • 36

3 Answers3

6

For groupby need some aggregation function(s), like mean, sum, max:

df.sort_values(['col5'],ascending=False).groupby('col1').mean().reset_index()

Or:

df.sort_values(['col5'],ascending=False).groupby('col1', as_index=False).mean()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • This reduces the shape of my dataframe from (124,14) to (9,6). I want all the 124 rows. Can you please help? – Gingerbread May 22 '18 at 10:31
  • Sorry this is not an answer in the light of the code the OP posted. They're doing a sort, not summary functions. And they want all the df rows. – smci Dec 31 '19 at 12:23
2

You can try the below code, I had a similar issue.

grouped=data.groupby(['Colname'])
grouped.apply(lambda _df: _df.sort_values(by=['col_to_be_sorted']))

hakuna_code
  • 783
  • 7
  • 16
1

you can use

grouped = df.sort_values(['col5'],ascending=False).groupby('col1',as_index = False).apply(lambda x: x.reset_index(drop = True))
grouped.reset_index().drop(['level_0','level_1'],axis = 1)

Refer to this stackoverflow link for clear explanation with an example How to reset a DataFrame's indexes for all groups in one step?