7

I have a dataframe with two columns ID and Activity. The activity is either 0 or 1. I want a new column containing a increasing number since the last activity was 1. However, the count should only be within one group (ID). If the activity is 1, the counting column should be reset to 0 and the count starts again.

So, I have a dataframe containing the following:

enter image description here

What is want is this:

enter image description here

Can someone help me?

Siem Peters
  • 193
  • 1
  • 12
  • it is really unclear to me .. – BENY Dec 10 '17 at 17:56
  • @Wen Hope this is more clear...? – Siem Peters Dec 10 '17 at 18:06
  • 5
    Don't post images of data, post data. If you post an image, you're saying other people should spend time typing your data in just to reproduce your situation. If you post it as text, people can simple copy/paste. – DSM Dec 10 '17 at 18:11

1 Answers1

6

We using a new para 'G' here

df['G']=df.groupby('ID').Activeity.apply(lambda x :(x.diff().ne(0)&x==1)|x==1)

df.groupby([df.ID,df.G.cumsum()]).G.apply(lambda x : (~x).cumsum())

Out[713]: 
0     1
1     2
2     0
3     1
4     2
5     1
6     2
7     0
8     1
9     0
10    1
11    1
12    0
13    0
14    1
15    2
Name: G, dtype: int32

Data input

df=pd.DataFrame({'ID':list('AAAAABBBBBBCCCCC'),'Activeity':[0,0,1,0,0,0,0,1,0,1,0,0,1,1,0,0]})

Explanation :

Here we get the new para 'G'
df['G']=df.groupby('ID').Activeity.apply(lambda x :(x.diff().ne(0)&x==1)|x==1)
df
Out[134]: 
    Activeity ID      G
0           0  A  False
1           0  A  False
2           1  A   True
3           0  A  False
4           0  A  False
5           0  B  False
6           0  B  False
7           1  B   True
8           0  B  False
9           1  B   True
10          0  B  False
11          0  C  False
12          1  C   True
13          1  C   True
14          0  C  False
15          0  C  False

Then we do cumsum for G, is to getting where is the cycle we should set the number to 0

df.G.cumsum()
Out[135]: 
0     0
1     0
2     1
3     1
4     1
5     1
6     1
7     2
8     2
9     3
10    3
11    3
12    4
13    5
14    5
15    5
Name: G, dtype: int32
BENY
  • 317,841
  • 20
  • 164
  • 234
  • This just works! Can you please explain what logic is being applied here? It especially gets confusing when we are grouping on a `cumsum` in the second statement. – sriramn Dec 14 '17 at 01:57
  • 2
    @sriramn cumsum in groupby is to get the group ID, I will update the answer adding more explanation – BENY Dec 14 '17 at 02:11
  • Actually, you don't even need the additional column. Cumulative sum can be done right on `Activeity`. In lambda function switch 0 and 1 using xor. Here is the final oneliner: `df.groupby(['ID', df.Activeity.cumsum()]).Activeity.apply(lambda x: (x ^ 1).cumsum())` – Břetislav Hájek May 30 '19 at 21:12