Pandas: conditional rolling count

Question

I have a Series that looks the following:

   col
0  B
1  B
2  A
3  A
4  A
5  B

It's a time series, therefore the index is ordered by time.

For each row, I'd like to count how many times the value has appeared consecutively, i.e.:

Output:

   col count
0  B   1
1  B   2
2  A   1 # Value does not match previous row => reset counter to 1
3  A   2
4  A   3
5  B   1 # Value does not match previous row => reset counter to 1

I found 2 related questions, but I can't figure out how to "write" that information as a new column in the DataFrame, for each row (as above). Using rolling_apply does not work well.

Finding consecutive segments in a pandas data frame

score 75 · Answer 1 · answered Aug 14 '17 at 16:36

75

I think there is a nice way to combine the solution of @chrisb and @CodeShaman (As it was pointed out CodeShamans solution counts total and not consecutive values).

  df['count'] = df.groupby((df['col'] != df['col'].shift(1)).cumsum()).cumcount()+1

  col  count
0   B      1
1   B      2
2   A      1
3   A      2
4   A      3
5   B      1

answered Aug 14 '17 at 16:36

P.Tillmann

2,090
10
17

4

@P.Tillmann what if you need to group by more than one column? – DataPlankton Aug 28 '19 at 10:41
this is such an elegant solution, thank! – gilzero Dec 18 '21 at 12:12
Was looking through many pages trying to just find this simple counter. Thank you very much! – Wicked Gummy Bear May 01 '22 at 09:29

score 25 · Answer 2 · edited Oct 24 '16 at 03:38

25

One-liner:

df['count'] = df.groupby('col').cumcount()

or

df['count'] = df.groupby('col').cumcount() + 1

if you want the counts to begin at 1.

edited Oct 24 '16 at 03:38

maxymoo

35,286
11
92
119

answered May 01 '15 at 18:39

CodeShaman

2,131
2
19
34

2

If you want to start the counter at it is easier to use: df['count'] = df.groupby('col').cumcount() + 1 – Gecko Feb 25 '16 at 12:55
45

This is not an answer to the question. This does not count consecutive values, it counts total values. – Gabriel Jan 31 '17 at 22:40
This solution would count the last B as number 3, and not 1 – asimo Oct 12 '21 at 02:36

score 24 · Answer 3 · answered Aug 04 '14 at 14:01

24

Based on the second answer you linked, assuming s is your series.

df = pd.DataFrame(s)
df['block'] = (df['col'] != df['col'].shift(1)).astype(int).cumsum()
df['count'] = df.groupby('block').transform(lambda x: range(1, len(x) + 1))


In [88]: df
Out[88]: 
  col  block  count
0   B      1      1
1   B      1      2
2   A      2      1
3   A      2      2
4   A      2      3
5   B      3      1

answered Aug 04 '14 at 14:01

chrisb

49,833
8
70
70

2

Really liked your method. But, how to do in case if grouping is done on multiple columns ? – Garrick Sep 26 '19 at 15:27

score 13 · Answer 4 · answered Aug 04 '14 at 14:17

13

I like the answer by @chrisb but wanted to share my own solution, since some people might find it more readable and easier to use with similar problems....

1) Create a function that uses static variables

def rolling_count(val):
    if val == rolling_count.previous:
        rolling_count.count +=1
    else:
        rolling_count.previous = val
        rolling_count.count = 1
    return rolling_count.count
rolling_count.count = 0 #static variable
rolling_count.previous = None #static variable

2) apply it to your Series after converting to dataframe

df  = pd.DataFrame(s)
df['count'] = df['col'].apply(rolling_count) #new column in dataframe

output of df

  col  count
0   B      1
1   B      2
2   A      1
3   A      2
4   A      3
5   B      1

answered Aug 04 '14 at 14:17

ZJS

3,991
2
15
22

This is a great answer... How would you modify it to work if the rolling count was based on two fields - for example, 'cola' and 'colb'? In that case, if either changed I'd want to see a reset of the counter.. Thank you - this would save me a ton of work – Stumbling Through Data Science Jun 11 '16 at 19:32
Nice! This works for a dataframe with any number of other columns. @chrisb's solution will only work with one column in the dataframe (when I run it on my dataframe, I get an error that `ValueError: Wrong number of items passed 4, placement implies 1`. – horcle_buzz Oct 03 '16 at 20:23

Benjamin Breton · Answer 5 · 2017-08-29T09:31:28.810

If you wish to do the same thing but filter on two columns, you can use this.

def count_consecutive_items_n_cols(df, col_name_list, output_col):
    cum_sum_list = [
        (df[col_name] != df[col_name].shift(1)).cumsum().tolist() for col_name in col_name_list
    ]
    df[output_col] = df.groupby(
        ["_".join(map(str, x)) for x in zip(*cum_sum_list)]
    ).cumcount() + 1
    return df

col_a col_b count
0   1     B     1
1   1     B     2
2   1     A     1
3   2     A     1
4   2     A     2
5   2     B     1

Pandas: conditional rolling count

5 Answers5

Linked