0

I'm working on a df with 2 columns e.g.

column1 = [False, False, False, True, False, False, True]
column2 = [1, 1, 1, 1, 1, 1, 1]

I want to sum all "False" values until the first "True" value, and again sum the following "False" values until the next "True" etc.

The output should be

column3 = [0,0,0,3,0,0,2]

I tried to sum column values but I can't "reset" the counter once hitting a "True" from a different column

  • @timegb not sure this is a correct duplicate, the logic seems more complex – mozway Dec 07 '22 at 09:03
  • 1
    @mozway OP will get most of the heavy lifting done with the dupe. Since OP didn't provide any attempt I'd expect them to open a new question with an attempt if there are remaining issues. – timgeb Dec 07 '22 at 09:05
  • @ידיה שוואלם - Do you sum `False`s values? Or values in `column2` ? Or always `column2 == 1` ? – jezrael Dec 07 '22 at 09:20
  • In another words is important `column2` for output? – jezrael Dec 07 '22 at 09:21
  • @jezrael I sum column2 values, I apologize it was unclear. In the example above, if column2 = column2 = [3, 2, 1, 1, 1, 1, 1] column3 = [0,0,0,6,0,0,2] – Yedaya Schwalm Dec 07 '22 at 13:31

1 Answers1

-1

You can use:

df['column3'] = (df['column2']
 .mask(df['column1']) # get False values only
 .groupby(df.loc[::-1, 'column1'].cumsum()) # group with next True
 # get sum of False values only where True
 .transform('sum').where(df['column1'], 0).convert_dtypes()
)

Output:

   column1  column2  column3
0    False        1        0
1    False        1        0
2    False        1        0
3     True        1        3
4    False        1        0
5    False        1        0
6     True        1        2
mozway
  • 194,879
  • 13
  • 39
  • 75