1

I have written code that deletes all the rows after the first occurrence of value 1 in target but I want it to happen after the last.

enter image description here

I want to delete the 0's that come after the last 1, per user_id.

How can I change this piece of code to get that?

df = df[df.groupby('user_id')['target'].apply(lambda x: x.shift().eq(1).cumsum().eq(0))]

  • Hi Michal, please have a look at [How to create a MWE](https://stackoverflow.com/a/20159305/12242625) and provide some data. Thanks – Marco_CH Jan 19 '22 at 10:06

1 Answers1

1

In your solution remove shift and for remove 0 from back per groups change ordering of values in group by iloc[::-1]:

mask = (df.groupby('user_id')['target']
          .apply(lambda x: x.iloc[::-1].eq(1).cumsum().ne(0).iloc[::-1]))
df = df[mask]

For better performance is possible use if only 0 and 1 values in target:

mask = df.iloc[::-1].groupby('user_id')['target'].cumsum().ne(0).iloc[::-1]
df = df[mask]

If also another values like 0,1 use:

mask = (df.iloc[::-1]
          .assign(new = lambda x: x['target'].eq(1))
          .groupby('user_id')['new']
          .cumsum().ne(0)
          .iloc[::-1])
df = df[mask]

If need avoid remove only 0 groups use:

mask = df.groupby('user_id')['target'].transform('any')

mask1 = (df.iloc[::-1]
          .assign(new = lambda x: x['target'].eq(1))
          .groupby('user_id')['new']
          .cumsum().ne(0)
          .iloc[::-1])
df = df[~mask | mask1]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252