Create helper Series
and pass to GroupBy.cumcount
for counter:
m = df['events'].eq('no_event')
g = (m.ne(m.shift()) & m).cumsum()
df['RN2'] = df.groupby(['user', g]).cumcount().add(1)
print (df)
user events RN2
0 a no_event 1
1 a no_event 2
2 a aaa 3
3 a no_event 1
4 a bbb 2
5 b asdf 1
6 b no_event 1
7 b ghtu 2
Explanation:
- First compare by
eq
with no_event
- Then shift values
- Compare for not equal
- Chain with mask
m
for only no_event
rows
- Consecutive values by cumulative sum
print (df.assign(mask = df['events'].eq('no_event'),
shifted = m.shift(),
not_q = m.ne(m.shift()),
chained = (m.ne(m.shift()) & m),
consecut_gr = (m.ne(m.shift()) & m).cumsum()))
user events mask shifted not_q chained consecut_gr
0 a no_event True NaN True True 1
1 a no_event True True False False 1
2 a aaa False True True False 1
3 a no_event True False True True 2
4 a bbb False True True False 2
5 b asdf False False False False 2
6 b no_event True False True True 3
7 b ghtu False True True False 3
Still not 100% sure if necessary shift per groups, mainly it depends of data:
m = df['events'].eq('no_event')
g = (m.ne(m.groupby(df['user']).shift()) & m).cumsum()
df['RN'] = df.groupby(['user', g]).cumcount().add(1)
EDIT1: It is same:
np.random.seed(123)
N = 1000
L = ['no_event','a','s']
df = pd.DataFrame({'user': np.random.randint(100, size=N),
'events':np.random.choice(L,size=N)}).sort_values('user')
m = df['events'].eq('no_event')
g = (m.ne(m.shift()) & m).cumsum()
df['RN1'] = df.groupby(['user', g]).cumcount().add(1)
m = df['events'].eq('no_event')
g = (m.ne(m.groupby(df['user']).shift()) & m).cumsum()
df['RN2'] = df.groupby(['user', g]).cumcount().add(1)
print (df['RN2'].equals(df['RN1']))
True