Cumsum within group and reset on condition in pandas

Question

I have a dataframe with two columns ID and Activity. The activity is either 0 or 1. I want a new column containing a increasing number since the last activity was 1. However, the count should only be within one group (ID). If the activity is 1, the counting column should be reset to 0 and the count starts again.

So, I have a dataframe containing the following:

What is want is this:

Can someone help me?

Don't post images of data, post data. If you post an image, you're saying other people should spend time typing your data in just to reproduce your situation. If you post it as text, people can simple copy/paste. — DSM, Dec 10 '17 at 18:11

BENY · Answer 1 · 2017-12-14T02:14:11.057

We using a new para 'G' here

df['G']=df.groupby('ID').Activeity.apply(lambda x :(x.diff().ne(0)&x==1)|x==1)

df.groupby([df.ID,df.G.cumsum()]).G.apply(lambda x : (~x).cumsum())

Out[713]: 
0     1
1     2
2     0
3     1
4     2
5     1
6     2
7     0
8     1
9     0
10    1
11    1
12    0
13    0
14    1
15    2
Name: G, dtype: int32

Data input

df=pd.DataFrame({'ID':list('AAAAABBBBBBCCCCC'),'Activeity':[0,0,1,0,0,0,0,1,0,1,0,0,1,1,0,0]})

Explanation :

Here we get the new para 'G'
df['G']=df.groupby('ID').Activeity.apply(lambda x :(x.diff().ne(0)&x==1)|x==1)
df
Out[134]: 
    Activeity ID      G
0           0  A  False
1           0  A  False
2           1  A   True
3           0  A  False
4           0  A  False
5           0  B  False
6           0  B  False
7           1  B   True
8           0  B  False
9           1  B   True
10          0  B  False
11          0  C  False
12          1  C   True
13          1  C   True
14          0  C  False
15          0  C  False

Then we do cumsum for G, is to getting where is the cycle we should set the number to 0

df.G.cumsum()
Out[135]: 
0     0
1     0
2     1
3     1
4     1
5     1
6     1
7     2
8     2
9     3
10    3
11    3
12    4
13    5
14    5
15    5
Name: G, dtype: int32

This just works! Can you please explain what logic is being applied here? It especially gets confusing when we are grouping on a `cumsum` in the second statement. — sriramn, Dec 14 '17 at 01:57
@sriramn cumsum in groupby is to get the group ID, I will update the answer adding more explanation — BENY, Dec 14 '17 at 02:11
Actually, you don't even need the additional column. Cumulative sum can be done right on `Activeity`. In lambda function switch 0 and 1 using xor. Here is the final oneliner: `df.groupby(['ID', df.Activeity.cumsum()]).Activeity.apply(lambda x: (x ^ 1).cumsum())` — Břetislav Hájek, May 30 '19 at 21:12

Cumsum within group and reset on condition in pandas

1 Answers1

Linked