drop rows based on a condition based on another

Question

I have the following data frame

I want to drop all rows that are not the maximum value of every user_id

resulting on a 2 row data frame:

user_id	value
1	35
2	14

How can I do that?

Which part are you having trouble with? – wwii Aug 27 '22 at 21:28 — wwii, Aug 27 '22 at 21:28

Timeless · Answer 1 · 2022-08-27T21:51:39.767

1

You can use pandas.DataFrame.max after the grouping.
Assuming that your original dataframe is named df, try the code below :

out = df.groupby('user_id', as_index=False).max('value')

If you want to group more than one column, use this :

out = df.groupby(['user_id', 'sex'], as_index=False, sort=False)['value'].max()

edited Aug 27 '22 at 21:51

answered Aug 27 '22 at 21:19

Timeless

1

`df.groupby('user_id').max('value')` ? – wwii Aug 27 '22 at 21:23
thank you, if I have a couple more columns, how do I so they dont get dropped out of the result? – Enrique Martin Aug 27 '22 at 21:46
I added an EDIT in the end of my answer, check it out ! – Timeless Aug 27 '22 at 21:52
@EnriqueMartin `nlargest` method can be helpful in your case: `df.groupby('user_id').apply(lambda x: x.nlargest(1, 'value'))` – Boris Silantev Aug 27 '22 at 22:39
Thank you everyone. L'Artiste your solution will not work because grouping by user_id and sex will return many rows for a user_id as many combinations of those 2 variables exists. Boris your solution works perfectly did not know nlargest function, thank you very much! – Enrique Martin Aug 28 '22 at 14:25

1 Answers1