Filter rows with any one fulfills conditions in each group with pandas method chaining

Question

For the record, I've read these following threads but none of them seems to fulfill my need:

Say I have this following table df:

 user_id  is_manually  created_per_week
----------------------------------------
    10       True             59
    10      False             90
    33       True              0
    33      False             64
    50       True              0
    50      False              0

I want to exclude the users who have created nothing, i.e. created_per_week is 0 in both rows of is_manually True and False, which is user 50 in this case.

 user_id  is_manually  created_per_week
----------------------------------------
    10       True             59
    10      False             90
    33       True              0
    33      False             64

I learned that df.groupby doesn't have query method and should use apply instead.

The closest answer I've got is df.groupby("user_id").apply(lambda x: x[x["created_per_week"] > 0]), but it also excludes the row of user 33 manually True, which is undesirable. I've also tried df.groupby("user_id").apply(lambda x: x[any(x["created_per_week"] > 0)]) but it returns a KeyError.

In other words, I am searching the equivalence of df %>% group_by(user_id) %>% filter(any(created_per_week > 0)) in R. Thanks.

BENY · Answer 1 · 2018-03-21T03:38:17.630

transform + any

df[df.assign(New=df.created_per_week==0).groupby('user_id').created_per_week.transform('any')]
Out[425]: 
   user_id  is_manually  created_per_week
0       10         True                59
1       10        False                90
2       33         True                 0
3       33        False                64

Or simply by using loc+isin

df.loc[df.user_id.isin(df[df.created_per_week!=0].user_id)]
Out[426]: 
   user_id  is_manually  created_per_week
0       10         True                59
1       10        False                90
2       33         True                 0
3       33        False                64

From PiR

f, u = pd.factorize(df.user_id); df[np.bincount(f, df.created_per_week)[f] > 0]

You can add this `f, u = pd.factorize(df.user_id); df[np.bincount(f, df.created_per_week)[f] > 0]` — piRSquared, Mar 21 '18 at 03:18

score 0 · Accepted Answer · answered Mar 21 '18 at 02:49

0

You can apply groupby then filter command to get the output.

df.groupby('user_id').filter(lambda x: (x['created_per_week'] != 0).any())

    user_id is_manually created_per_week
0       10        True                59
1       10       False                90
2       33        True                 0
3       33       False                64

answered Mar 21 '18 at 02:49

YOLO

20,181
5
20
40

So `DataFrameGroupBy.filter` does quite different thing with `DataFrame.filter`... How confusing. Thanks, anyway! – ytu Mar 21 '18 at 03:21
Yes, `DataFrame.filter` is for selecting column or indexes based on their names. This one is to select rows given a condition. It's confusing. – YOLO Mar 21 '18 at 10:51

Filter rows with any one fulfills conditions in each group with pandas method chaining

2 Answers2