Extract rows from pandas dataframe based on condition

Question

I have the pandas dataframe "data", and want to keep only the rows where the sum of "numb_people" per category "class" is at least 2.

This, however, throws an index error (the indices do not match anymore):

data = data[data.groupby('class').sum()['numb_people'] > 2]

How can I do this in a similarly simple manner?

Please [provide a reproducible copy of the DataFrame with `to_clipboard`](https://stackoverflow.com/questions/52413246/provide-a-reproducible-copy-of-the-dataframe-with-to-clipboard/52413247#52413247) — Trenton McKinney, Oct 09 '19 at 01:46
`data[data.groupby('class').numb_people.transform('sum') > 2]` — rafaelc, Oct 09 '19 at 01:47
If I do data = data[data.groupby('class').numb_people.transform('sum') > 2], is this thresholding the data by this criterion such that only classes with sum > 2 are left, or is this new data variable actually containing sums (which it should not)? — TestGuest, Oct 09 '19 at 01:57
`groupby` expressions in pandas have the [`filter`](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#filtration) method that might make this code a little more elegant than using the `transform` method. It's pandas' closest equivalent to a SQL-like HAVING statement. — Jacob Turpin, Oct 09 '19 at 02:11

score 1 · Accepted Answer · answered Oct 09 '19 at 01:59

1

As @rafaelc said in comment:

idx = data.groupby('class').numb_people.transform('sum') > 2
print(data[idx])

answered Oct 09 '19 at 01:59

oreopot

1 Answers1