0

I want to get the trip_time with the highest count for each age_group

age_group trip_time counts
18 - 30yrs 01am 23
18 - 30yrs 02am 2
18 - 30yrs 03am 213
31 - 50yrs 01am 74
31 - 50yrs 02am 211
31 - 50yrs 03am 852
51 - 70yrs 01am 23
51 - 70yrs 02am 11
51 - 70yrs 03am 101

Expected output:

age_group trip_time counts
18 - 30yrs 03am 213
31 - 50yrs 03am 852
51 - 70yrs 03am 101
trip_time_age_group.groupby(['age_group', 'trip_time'])['counts'].max()

But it gives me wrong result

wjandrea
  • 28,235
  • 9
  • 60
  • 81
Khola
  • 81
  • 10

2 Answers2

2

groupby 'age-group' and then use transform to get the max count for each age-group, which you then compare with the count of the DF, to get the resultset

df.loc[df.groupby('age_group')['counts'].transform('max').eq(df['counts'])]
    age_group trip_time  counts
2  18 - 30yrs      03am     213
5  31 - 50yrs      03am     852
8  51 - 70yrs      03am     101
wjandrea
  • 28,235
  • 9
  • 60
  • 81
Naveed
  • 11,495
  • 2
  • 14
  • 21
1

Groupby only age_group, then find the rows in each group with the max counts.

df.groupby('age_group').apply(
    lambda sf: sf.loc[sf['counts'] == sf['counts'].max()]
).reset_index(drop=True)
    age_group trip_time  counts
0  18 - 30yrs      03am     213
1  31 - 50yrs      03am     852
2  51 - 70yrs      03am     101

You could also do sf.query('counts == counts.max()') instead of sf.loc[...].

wjandrea
  • 28,235
  • 9
  • 60
  • 81