0

I am trying to do something very similar to this post. Except I have outcome from a die, e.g. 1-6 and I need to count streaks across all possible values of the die.

import numpy as np
import pandas as pd

data = [5,4,3,6,6,3,5,1,6,6]
df = pd.DataFrame(data, columns = ["Outcome"])
df.head(n=10)

def f(x):

    x['c'] = (x['Outcome'] == 6).cumsum()
    x['a'] = (x['c'] == 1).astype(int)
    x['b'] = x.groupby( 'c' ).cumcount()

    x['streak'] = x.groupby( 'c' ).cumcount() + x['a']

    return x

df = df.groupby('Outcome', sort=False).apply(f)

print(df.head(n=10))

   Outcome  c  a  b  streak
0        5  0  0  0       0
1        4  0  0  0       0
2        3  0  0  0       0
3        6  1  1  0       1
4        6  2  0  0       0
5        3  0  0  1       1
6        5  0  0  1       1
7        1  0  0  0       0
8        6  3  0  0       0
9        6  4  0  0       0

My problem is that 'c' does not behave. It should 'reset' its counter every time the streak breaks, or a and b won't be correct.

Ideally, I would like something elegant like

def f(x):
    x['streak'] = x.groupby( (x['stat'] != 0).cumsum()).cumcount() + 
                  ( (x['stat'] != 0).cumsum() == 0).astype(int) 
    return x

as suggested in the linked post.

tmo
  • 1,393
  • 1
  • 17
  • 47

1 Answers1

0

Here's a solution with cumsum and cumcount, as mentioned, but not as "elegant" as expected (ie not a one-liner).

I start by labelling the consecutive values, giving "block" numbers:

In [326]: df['block'] = (df['Outcome'] != df['Outcome'].shift(1)).astype(int).cumsum()

In [327]: df
Out[327]: 
   Outcome  block
0        5      1
1        4      2
2        3      3
3        6      4
4        6      4
5        3      5
6        5      6
7        1      7
8        6      8
9        6      8

Since I now know when repeating values occur, I just need to incrementally count them, for every group:

In [328]: df['streak'] = df.groupby('block').cumcount()

In [329]: df
Out[329]: 
   Outcome  block  streak
0        5      1       0
1        4      2       0
2        3      3       0
3        6      4       0
4        6      4       1
5        3      5       0
6        5      6       0
7        1      7       0
8        6      8       0
9        6      8       1

If you want to start counting from 1, feel free to add + 1 in the last line.

3kt
  • 2,543
  • 1
  • 17
  • 29