Python pandas cumsum with reset everytime there is a 0

Question

I have a matrix with 0s and 1s, and want to do a cumsum on each column that resets to 0 whenever a zero is observed. For example, if we have the following:

df = pd.DataFrame([[0,1],[1,1],[0,1],[1,0],[1,1],[0,1]],columns = ['a','b'])
print(df)
   a  b
0  0  1
1  1  1
2  0  1
3  1  0
4  1  1
5  0  1

The result I desire is:

However, when I try df.cumsum() * df, I am able to correctly identify the 0 elements, but the counter does not reset:

print(df.cumsum() * df)
   a  b
0  0  1
1  1  2
2  0  3
3  2  0
4  3  4
5  0  5

score 27 · Accepted Answer · answered Aug 30 '17 at 16:00

27

You can use:

a = df != 0
df1 = a.cumsum()-a.cumsum().where(~a).ffill().fillna(0).astype(int)
print (df1)
   a  b
0  0  1
1  1  2
2  0  3
3  1  0
4  2  1
5  0  2

answered Aug 30 '17 at 16:00

jezrael

822,522
95
1,334
1,252

Why does this work? I'm not sure if it will work for my situation. – Veggiet Nov 04 '21 at 14:31
1

@Veggiet - check [this](https://stackoverflow.com/questions/52717996/how-can-i-count-the-number-of-consecutive-trues-in-a-dataframe/52718619#52718619) for simialr solution with explanation – jezrael Nov 04 '21 at 14:40
1

To do this across the columns instead of the rows, instead use: a = df.T != 0 df1= (a.cumsum()-a.cumsum().where(~a).ffill().fillna(0).astype(int)).T – Anonymous May 23 '22 at 17:00

score 9 · Answer 2 · answered Aug 30 '17 at 15:53

9

Try this

df = pd.DataFrame([[0,1],[1,1],[0,1],[1,0],[1,1],[0,1]],columns = ['a','b'])
df['groupId1']=df.a.eq(0).cumsum()
df['groupId2']=df.b.eq(0).cumsum()
New=pd.DataFrame()
New['a']=df.groupby('groupId1').a.transform('cumsum')
New['b']=df.groupby('groupId2').b.transform('cumsum')

New
Out[1184]: 
   a  b
0  0  1
1  1  2
2  0  3
3  1  0
4  2  1
5  0  2

answered Aug 30 '17 at 15:53

BENY

317,841
20
164
234

1

ok I see, you are enumerating everytime you see a 0 to create separate groups, and then cumsum within each group. Makes sense. Thanks! – nanojohn Aug 30 '17 at 15:59
@nanojohn yes, groupid made for it. Yw~ – BENY Aug 30 '17 at 16:00
This looks nice and flexible – Veggiet Nov 04 '21 at 14:31

George Shimanovsky · Answer 3 · 2022-12-01T10:15:07.910

You may also try the following naive but reliable approach.

Per every column - create groups to count within. Group starts once sequential value difference by row appears and lasts while value is being constant: (x != x.shift()).cumsum().
Example:

Calculate cummulative sums within groups per columns using pd.DataFrame's apply and groupby methods and you get cumsum with the zero reset in one line:

import pandas as pd

df = pd.DataFrame([[0,1],[1,1],[0,1],[1,0],[1,1],[0,1]], columns = ['a','b'])

cs = df.apply(lambda x: x.groupby((x != x.shift()).cumsum()).cumsum())
print(cs)

   a  b
0  0  1
1  1  2
2  0  3
3  1  0
4  2  1
5  0  2

score 1 · Answer 4 · answered Aug 30 '17 at 16:03

A slightly hacky way would be to identify the indices of the zeros and set the corresponding values to the negative of those indices before doing the cumsum:

import pandas as pd
df = pd.DataFrame([[0,1],[1,1],[0,1],[1,0],[1,1],[0,1]],columns = ['a','b'])
z = np.where(df['b']==0)
df['b'][z[0]] = -z[0]
df['b'] = np.cumsum(df['b'])
df

   a  b
0  0  1
1  1  2
2  0  3
3  1  0
4  1  1
5  0  2

Python pandas cumsum with reset everytime there is a 0

4 Answers4

Linked