get max value after grouping two columns pandas

Question

I want to get the trip_time with the highest count for each age_group

age_group	trip_time	counts
18 - 30yrs	01am	23
18 - 30yrs	02am	2
18 - 30yrs	03am	213
31 - 50yrs	01am	74
31 - 50yrs	02am	211
31 - 50yrs	03am	852
51 - 70yrs	01am	23
51 - 70yrs	02am	11
51 - 70yrs	03am	101

Expected output:

age_group	trip_time	counts
18 - 30yrs	03am	213
31 - 50yrs	03am	852
51 - 70yrs	03am	101

trip_time_age_group.groupby(['age_group', 'trip_time'])['counts'].max()

But it gives me wrong result

score 2 · Accepted Answer · edited Sep 23 '22 at 19:14

2

groupby 'age-group' and then use transform to get the max count for each age-group, which you then compare with the count of the DF, to get the resultset

df.loc[df.groupby('age_group')['counts'].transform('max').eq(df['counts'])]

    age_group trip_time  counts
2  18 - 30yrs      03am     213
5  31 - 50yrs      03am     852
8  51 - 70yrs      03am     101

edited Sep 23 '22 at 19:14

wjandrea

28,235
9
60
81

answered Sep 23 '22 at 19:06

Naveed

11,495
2
14
21

score 1 · Answer 2 · answered Sep 23 '22 at 19:29

Groupby only age_group, then find the rows in each group with the max counts.

df.groupby('age_group').apply(
    lambda sf: sf.loc[sf['counts'] == sf['counts'].max()]
).reset_index(drop=True)

    age_group trip_time  counts
0  18 - 30yrs      03am     213
1  31 - 50yrs      03am     852
2  51 - 70yrs      03am     101

You could also do sf.query('counts == counts.max()') instead of sf.loc[...].

get max value after grouping two columns pandas

2 Answers2