creating a column which keeps a running count of consecutive values

Question

I am trying to create a column (“consec”) which will keep a running count of consecutive values in another (“binary”) without using loop. This is what the desired outcome would look like:

.    binary consec
1       0      0
2       1      1
3       1      2
4       1      3
5       1      4
5       0      0
6       1      1
7       1      2
8       0      0

However, this...

df['consec'][df['binary']==1] = df['consec'].shift(1) + df['binary']

results in this...

.  binary   consec
0     1       NaN
1     1       1
2     1       1
3     0       0
4     1       1
5     0       0
6     1       1
7     1       1
8     1       1
9     0       0

I see other posts which use grouping or sorting, but unfortunately, I don't see how that could work for me.

You say "consecutive", but 0 and 1 are different and you still increase the counter. So it feels more like you want to reset the counter at each zero. What do you want to happen there are two zeros in a row? — DSM, Oct 14 '15 at 16:21
good point - poor description, I do indeed want to reset the counter at each zero. so consecutive 0s in df['binary'] would generate 0s in df['consec']. — MJS, Oct 14 '15 at 16:32

score 17 · Accepted Answer · answered Oct 14 '15 at 16:35

You can use the compare-cumsum-groupby pattern (which I really need to getting around to writing up for the documentation), with a final cumcount:

>>> df = pd.DataFrame({"binary": [0,1,1,1,0,0,1,1,0]})
>>> df["consec"] = df["binary"].groupby((df["binary"] == 0).cumsum()).cumcount()
>>> df
   binary  consec
0       0       0
1       1       1
2       1       2
3       1       3
4       0       0
5       0       0
6       1       1
7       1       2
8       0       0

This works because first we get the positions where we want to reset the counter:

>>> (df["binary"] == 0)
0     True
1    False
2    False
3    False
4     True
5     True
6    False
7    False
8     True
Name: binary, dtype: bool

The cumulative sum of these gives us a different id for each group:

>>> (df["binary"] == 0).cumsum()
0    1
1    1
2    1
3    1
4    2
5    3
6    3
7    3
8    4
Name: binary, dtype: int64

And then we can pass this to groupby and use cumcount to get an increasing index in each group.

Would be a nice addition to the documentation, possibly in the Cookbook. — Alexander, Oct 14 '15 at 16:45
mind blown. is there any reason this wouldn't work for a panel in place of the dataframe if pan['A'] = df ? — MJS, Oct 14 '15 at 17:12

score 3 · Answer 2 · answered Nov 25 '16 at 11:35

For those who ended up here looking for an answer to the "misunderstood" version: To reset count for each change in the binary column, so that consec does "keep a running count of consecutive values", the following seems to work:

df["consec2"] = df["binary"].groupby((df["binary"] <> df["binary"].shift()).cumsum()).cumcount()

creating a column which keeps a running count of consecutive values

2 Answers2

Linked