For a pandas DataFrame with groups I want to keep all rows until the first occurence of a specific value (and discard all other rows).
MWE:
import pandas as pd
df = pd.DataFrame({'A' : ['foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'tmp'],
'B' : [0, 1, 0, 0, 0, 1, 0],
'C' : [2.0, 5., 8., 1., 2., 9., 7.]})
gives
A B C
0 foo 0 2.0
1 foo 1 5.0
2 foo 0 8.0
3 bar 0 1.0
4 bar 0 2.0
5 bar 1 9.0
6 tmp 0 7.0
and I want to keep all rows for each group (A
is the grouping variable) until B == 1
(including this row). So, my desired output is
A B C
0 foo 0 2.0
1 foo 1 5.0
3 bar 0 1.0
4 bar 0 2.0
5 bar 1 9.0
6 tmp 0 7.0
How can I keep all rows of a grouped DataFrage meeting a certain criteria?
I found how to drop specific groups not meeting a certain criteria (and keeping all other rows of all other groups), but not how to drop specific rows for all groups. The farest I got was to get the indices of the rows in each group, I want to keep:
df.groupby('A').apply(lambda x: x['B'].cumsum().searchsorted(1))
resulting in
A
bar 2
foo 1
tmp 1
Which isn't sufficient, as it does not return the actual data (and it might be better, if for tmp
the result was 0
)