1

I want to choose maximum in subgroups level. Here is data frame

df = pd.DataFrame({
'sp' : ['MM1', 'MM1', 'MM1', 'MM2', 'MM2', 'MM2', 'MM4', 'MM4','MM4'],
'mt' : ['S1', 'S1', 'S3', 'S3', 'S4', 'S4', 'S2', 'S2', 'S2'],
'count' : [3,2,5,8,10,1,2,2,7]
})

I tried

df.groupby(["sp", "mt"])['count'].max()

Which result in

sp   mt
MM1  S1     3
     S3     5
MM2  S3     8
     S4    10
MM4  S2     7

But I want to further choose max in subgroups

sp   mt
MM1  S3     5
MM2  S4     10
MM4  S2     7
Talha Anwar
  • 2,699
  • 4
  • 23
  • 62
  • Does this answer your question? [Get the row(s) which have the max value in groups using groupby](https://stackoverflow.com/questions/15705630/get-the-rows-which-have-the-max-value-in-groups-using-groupby) – Epic Chen Jun 16 '21 at 16:40

1 Answers1

2

You can use idxmax and use it in loc:

df.loc[df.groupby(["sp"])['count'].idxmax()]

    sp  mt  count
2  MM1  S3      5
4  MM2  S4     10
8  MM4  S2      7

idxmax returns the index of the max value, if you then .locate the indexes in the original dataframe you get all values associated with it.

Andreas
  • 8,694
  • 3
  • 14
  • 38
  • i tried this, but passing both 'sp' and 'mt' in groupby, because I think I am creating groupby on both. because both have relation – Talha Anwar Jun 16 '21 at 15:52
  • @TalhaAnwar I am not sure what you mean by that, please elaborate. It would probaly help if you could provide an example output where the difference matters, because I don't see any reason to use `mt` in grouping if you ignore it for the max value anyway. – Andreas Jun 16 '21 at 15:54
  • @TalhaAnwar - You're only keeping the `max` for each "sp", you don't have to `groupby` "mt". – not_speshal Jun 16 '21 at 15:57
  • actually in actual dataset, sp is year, and mt is month. i want to know in which year and month max value occurs, the year month relation is confusing me – Talha Anwar Jun 16 '21 at 15:59
  • 1
    @TalhaAnwar but then the shown solution is completely viable because you dont actually group by month, you group by year. You want to know the max per year and see which month it was. That is why in the above solution we find the id of the maximum month, and then filter the dataframe to only show those month with max values. – Andreas Jun 16 '21 at 18:51