7

I am trying to create a column (“consec”) which will keep a running count of consecutive values in another (“binary”) without using loop. This is what the desired outcome would look like:

.    binary consec
1       0      0
2       1      1
3       1      2
4       1      3
5       1      4
5       0      0
6       1      1
7       1      2
8       0      0

However, this...

df['consec'][df['binary']==1] = df['consec'].shift(1) + df['binary']

results in this...

.  binary   consec
0     1       NaN
1     1       1
2     1       1
3     0       0
4     1       1
5     0       0
6     1       1
7     1       1
8     1       1
9     0       0

I see other posts which use grouping or sorting, but unfortunately, I don't see how that could work for me.

cottontail
  • 10,268
  • 18
  • 50
  • 51
MJS
  • 1,573
  • 3
  • 17
  • 26
  • You say "consecutive", but 0 and 1 are different and you still increase the counter. So it feels more like you want to reset the counter at each zero. What do you want to happen there are two zeros in a row? – DSM Oct 14 '15 at 16:21
  • good point - poor description, I do indeed want to reset the counter at each zero. so consecutive 0s in df['binary'] would generate 0s in df['consec']. – MJS Oct 14 '15 at 16:32

2 Answers2

17

You can use the compare-cumsum-groupby pattern (which I really need to getting around to writing up for the documentation), with a final cumcount:

>>> df = pd.DataFrame({"binary": [0,1,1,1,0,0,1,1,0]})
>>> df["consec"] = df["binary"].groupby((df["binary"] == 0).cumsum()).cumcount()
>>> df
   binary  consec
0       0       0
1       1       1
2       1       2
3       1       3
4       0       0
5       0       0
6       1       1
7       1       2
8       0       0

This works because first we get the positions where we want to reset the counter:

>>> (df["binary"] == 0)
0     True
1    False
2    False
3    False
4     True
5     True
6    False
7    False
8     True
Name: binary, dtype: bool

The cumulative sum of these gives us a different id for each group:

>>> (df["binary"] == 0).cumsum()
0    1
1    1
2    1
3    1
4    2
5    3
6    3
7    3
8    4
Name: binary, dtype: int64

And then we can pass this to groupby and use cumcount to get an increasing index in each group.

DSM
  • 342,061
  • 65
  • 592
  • 494
3

For those who ended up here looking for an answer to the "misunderstood" version: To reset count for each change in the binary column, so that consec does "keep a running count of consecutive values", the following seems to work:

df["consec2"] = df["binary"].groupby((df["binary"] <> df["binary"].shift()).cumsum()).cumcount()

enter image description here

user2738815
  • 1,196
  • 3
  • 12
  • 19