-1

I have the following data frame

user_id value
1 5
1 7
1 11
1 15
1 35
2 8
2 9
2 14

I want to drop all rows that are not the maximum value of every user_id

resulting on a 2 row data frame:

user_id value
1 35
2 14

How can I do that?

1 Answers1

1

You can use pandas.DataFrame.max after the grouping.
Assuming that your original dataframe is named df, try the code below :

out = df.groupby('user_id', as_index=False).max('value')

>>> print(out)

enter image description here

Edit :

If you want to group more than one column, use this :

out = df.groupby(['user_id', 'sex'], as_index=False, sort=False)['value'].max()

>>> print(out)

enter image description here

Timeless
  • 22,580
  • 4
  • 12
  • 30
  • 1
    `df.groupby('user_id').max('value')` ? – wwii Aug 27 '22 at 21:23
  • thank you, if I have a couple more columns, how do I so they dont get dropped out of the result? – Enrique Martin Aug 27 '22 at 21:46
  • I added an EDIT in the end of my answer, check it out ! – Timeless Aug 27 '22 at 21:52
  • @EnriqueMartin `nlargest` method can be helpful in your case: `df.groupby('user_id').apply(lambda x: x.nlargest(1, 'value'))` – Boris Silantev Aug 27 '22 at 22:39
  • Thank you everyone. L'Artiste your solution will not work because grouping by user_id and sex will return many rows for a user_id as many combinations of those 2 variables exists. Boris your solution works perfectly did not know nlargest function, thank you very much! – Enrique Martin Aug 28 '22 at 14:25