Sorting the Pandas DataFrame Describe output

Question

I am trying to sort the output of describe() with count. Not sure, how to solve.

Tried sort_by and .loc but none of it serves the purpose of sorting the output of describe.

Need to edit the below line of code:

df.groupby("Disease_Category")['Approved_Amt'].describe().reset_index()

Current Output

Disease_Category count mean std min 25% 50% 75% max
1 Disease1 5.0 82477.600000 51744.487632 30461.0 58318.00 72201.0 83408.00 168000.0 

2 Disease2 190.0 35357.163158 46268.552683 1707.0 13996.25 22186.0 36281.75 331835.0

Desired Output

Disease_Category count mean std min 25% 50% 75% max
1 Disease2 190.0 35357.163158 46268.552683 1707.0 13996.25 22186.0 36281.75 331835.0 

2 Disease1 5.0 82477.600000 51744.487632 30461.0 58318.00 72201.0 83408.00 168000.0

The sorting functions are `sort()` and `sort_values()`. Where did you get `sort_by()` ? — kerwei, May 07 '19 at 04:21

score 4 · Accepted Answer · answered May 07 '19 at 04:27

4

Use sort_values(). Documentation.

import pandas as pd

df1 = df.groupby("Disease_Category")['Approved_Amt'].describe().reset_index()
>>>df1
  Disease_Category  count   mean    std    min       25%    50%       75%     max
0         Disease1      5  82477  51744  30461  58318.00  72201  83408.00  168000
1         Disease2    190  35357  46268   1707  13996.25  22186  36281.75  331835

>>>df1.sort_values('count', ascending=False)
  Disease_Category  count   mean    std    min       25%    50%       75%     max
1         Disease2    190  35357  46268   1707  13996.25  22186  36281.75  331835
0         Disease1      5  82477  51744  30461  58318.00  72201  83408.00  168000

answered May 07 '19 at 04:27

kerwei

1,822
1
13
22

1

Thanks for that. Isn't there a way to add sort_values in the same line as describe command. – Shins May 10 '19 at 09:24
@Shins You should be able to chain it up, as long as the structure remains a dataframe. `df.groupby("Disease_Category")['Approved_Amt'].describe().reset_index().sort_values('count', ascending=False)`. You just have to weigh it against readability etc. – kerwei May 10 '19 at 14:03
Hello I have this error : # Check for duplicates KeyError: 'count' , any idea ? – harmonius cool Jul 23 '20 at 09:55
@harmoniuscool Are you trying to describe a field named 'count' too? – kerwei Aug 04 '20 at 15:34

score 1 · Answer 2 · answered May 07 '19 at 04:04

This should work.

import pandas as pd

df = pd.DataFrame({'Disease' : ['Disease1', 'Disease2'],
                   'Category' : [5,190],
                   'count' : [82477, 35357],
                   'mean' : [51744, 46268],
                   'std' : [30461, 1707],
                   'etc' : [1,2]})

print(df)
#   Category   Disease  count  etc   mean    std
#0         5  Disease1  82477    1  51744  30461
#1       190  Disease2  35357    2  46268   1707

# invert rows of dataframe so last row becomes the first
df = df.reindex(index = df.index[::-1])

df = df.reset_index()

#   index  Category   Disease  count  etc   mean    std
#0      1       190  Disease2  35357    2  46268   1707
#1      0         5  Disease1  82477    1  51744  30461

If you're inverting the row sequence then it isn't sorting. There isn't any semantics behind the action to determine the order of the rows. If it works, it's basically because the rows are already sorted in the first place and by inverting it, you're changing it from ascending to descending etc. — kerwei, May 07 '19 at 04:11
Agree, inverting the index is an ideal solution since I am looking at sorting 'Count' and I have around 1500 rows to sort. — Shins, May 10 '19 at 09:31

Sorting the Pandas DataFrame Describe output

Current Output

Desired Output

2 Answers2

Linked