pandas: sorting observations within groupby groups

Question

According to the answer to pandas groupby sort within groups, in order to sort observations within each group one needs to do a second groupby on the results of the first groupby. Why a second groupby is needed? I would've assumed that observations are already arranged into groups after running the first groupby and all that would be needed is a way to enumerate those groups (and run apply with order).

score 17 · Accepted Answer · answered Mar 18 '16 at 01:26

Because once you apply a function after a groupby the results are combined back into a normal ungrouped data frame. Using groupby and a groupby method like sort should be thought of like a Split-Apply-Combine operation

The groupby splits the original data frame and the method is applied to each group, but then the results are combined again implicitly.

In that other question, they could have reversed the operation (sorted first) and then not have to use two groupbys. They could do:

df.sort(['job','count'],ascending=False).groupby('job').head(3)

Note: `sort` is deprecated use `sort_values` – tread May 14 '18 at 08:34 — tread, May 14 '18 at 08:34

score 2 · Answer 2 · answered Jun 06 '19 at 08:02

They need a second group by in that case, because on top of sorting, they want to keep only the top 3 rows of each group.

If you just need to sort after a group by you can do :

df_res = df.groupby(['job','source']).agg({'count':sum}).sort_values(['job','count'],ascending=False)

One group by is enough.

And if you want to keep the 3 rows with the highest count for each group, then you can group again and use the head() function :

df_res.groupby('job').head(3)

pandas: sorting observations within groupby groups

2 Answers2