Pandas - how to aggregate values between 2 ranges in a specific column

Question

I'm working on a df with 2 columns e.g.

column1 = [False, False, False, True, False, False, True]
column2 = [1, 1, 1, 1, 1, 1, 1]

I want to sum all "False" values until the first "True" value, and again sum the following "False" values until the next "True" etc.

The output should be

column3 = [0,0,0,3,0,0,2]

I tried to sum column values but I can't "reset" the counter once hitting a "True" from a different column

@timegb not sure this is a correct duplicate, the logic seems more complex — mozway, Dec 07 '22 at 09:03
@mozway OP will get most of the heavy lifting done with the dupe. Since OP didn't provide any attempt I'd expect them to open a new question with an attempt if there are remaining issues. — timgeb, Dec 07 '22 at 09:05
@ידיה שוואלם - Do you sum `False`s values? Or values in `column2` ? Or always `column2 == 1` ? — jezrael, Dec 07 '22 at 09:20
@jezrael I sum column2 values, I apologize it was unclear. In the example above, if column2 = column2 = [3, 2, 1, 1, 1, 1, 1] column3 = [0,0,0,6,0,0,2] — Yedaya Schwalm, Dec 07 '22 at 13:31

score -1 · Accepted Answer · answered Dec 07 '22 at 09:02

You can use:

df['column3'] = (df['column2']
 .mask(df['column1']) # get False values only
 .groupby(df.loc[::-1, 'column1'].cumsum()) # group with next True
 # get sum of False values only where True
 .transform('sum').where(df['column1'], 0).convert_dtypes()
)

Output:

   column1  column2  column3
0    False        1        0
1    False        1        0
2    False        1        0
3     True        1        3
4    False        1        0
5    False        1        0
6     True        1        2

Pandas - how to aggregate values between 2 ranges in a specific column

1 Answers1