0

How would you go about grouping a DataFrame based on one column, checking which is the highest value on another column within each group (or any other condition), and creating a new DataFrame with the rows of all values found during the grouping?

If I use groupby on the column of interest to then check the condition I want, the only thing I'll get back is a df with two columns, but I want to all columns of the original df.

Code:

df_best = df.groupby('Type 1')['Total'].max()

I know why my code above doesn't work in this case, but I can't figure out a way to get what I want...

Thanks in advance!

Sajan
  • 1,247
  • 1
  • 5
  • 13
martifapa
  • 107
  • 3
  • 12
  • This post might be helpful - https://stackoverflow.com/questions/47360510/pandas-groupby-and-aggregation-output-should-include-all-the-original-columns-i – Sajan Apr 29 '20 at 17:03
  • 2
    You typically use `groupby` + `transform` to broadcast the result back to the original DataFrame. This allows you to then create a Boolean Mask with a simple comparison. I.e. to find all rows in the original that are equal to to `max` 'Total' within each group you would do `df[df['Total'] == df.groupby('Type 1')['Total'].transform('max')]` – ALollz Apr 29 '20 at 17:14
  • 1
    If it's truly all rows max within group, then I think we'll close it as a duplicate of: https://stackoverflow.com/questions/15705630/get-the-rows-which-have-the-max-value-in-groups-using-groupby. Otherwise let us know if you need something different. Those answers vary a bit depending upon whether you need a single row, or possibly multiple per group. – ALollz Apr 29 '20 at 17:16
  • Hey ALollz, the groupby + transform worked! (For the groups which had more than one 'Total' max value it returned the correspondent amount of rows, but that's fine in my case!). Thanks a lot!! – martifapa Apr 30 '20 at 08:01

1 Answers1

0

What I finally did was:

best = df[df['Total'] == df.groupby('Type 1')['Total'].transform('max')]
best = best.sort_values('Type 1')

As ALollz commented, this finds all rows in the original df (after applying the groupby) that are equal to the condition established (in this case 'max'), and returns a new df based on the results.

ALollz, thanks a lot!

martifapa
  • 107
  • 3
  • 12