I am trying to do something very similar to this post. Except I have outcome from a die, e.g. 1-6 and I need to count streaks across all possible values of the die.
import numpy as np
import pandas as pd
data = [5,4,3,6,6,3,5,1,6,6]
df = pd.DataFrame(data, columns = ["Outcome"])
df.head(n=10)
def f(x):
x['c'] = (x['Outcome'] == 6).cumsum()
x['a'] = (x['c'] == 1).astype(int)
x['b'] = x.groupby( 'c' ).cumcount()
x['streak'] = x.groupby( 'c' ).cumcount() + x['a']
return x
df = df.groupby('Outcome', sort=False).apply(f)
print(df.head(n=10))
Outcome c a b streak
0 5 0 0 0 0
1 4 0 0 0 0
2 3 0 0 0 0
3 6 1 1 0 1
4 6 2 0 0 0
5 3 0 0 1 1
6 5 0 0 1 1
7 1 0 0 0 0
8 6 3 0 0 0
9 6 4 0 0 0
My problem is that 'c' does not behave. It should 'reset' its counter every time the streak breaks, or a and b won't be correct.
Ideally, I would like something elegant like
def f(x):
x['streak'] = x.groupby( (x['stat'] != 0).cumsum()).cumcount() +
( (x['stat'] != 0).cumsum() == 0).astype(int)
return x
as suggested in the linked post.