2

I have a dataframe that you can see how it is in the following. The column named target is my desired column:

group    value    target

  1        1        0
  1        2        0
  1        3        2
  1        4        0
  1        5        1
  2        1        0
  2        2        0
  2        3        0
  2        4        1
  2        5        3

Now I want to find the first non-zero value in the target column for each group and remove rows before that row in each group. So the output should be like this:

group    value    target

  1        3        2
  1        4        0
  1        5        1
  2        4        1
  2        5        3

I have seen this post, but I don't how to change the code to get my desired result.
How can I do this?

Mahdi
  • 331
  • 3
  • 11

2 Answers2

2

In the groupby, set sort to False, get the cumsum, then filter for rows not equal to 0 :

df.loc[df.groupby(["group"], sort=False).target.cumsum() != 0]

    group   value   target
2      1       3    2
3      1       4    0
4      1       5    1
8      2       4    1
9      2       5    3
sammywemmy
  • 27,093
  • 4
  • 17
  • 31
1

This shoul do. I'm sure you can do it with less reset_index(), but this shouldn't affect too much the speed if your dataframe isn't too big:

idx = dff[dff.target.ne(0)].reset_index().groupby('group').index.first()
mask = (dff.reset_index().set_index('group')['index'].ge(idx.to_frame()['index'])).values
df_final = dff[mask]

Output:

0  group value  target
3      1     3       2
4      1     4       0
5      1     5       1
9      2     4       1
10     2     5       3
Juan C
  • 5,846
  • 2
  • 17
  • 51