Running Count per Categorical Value with two columns

Question

I'm currently stuck with getting the "nth" instance of a categorical value per ID as I plan on doing this for process mining purposes.

Dataset:

ID  Activity
161 Diagnosis
161 ID Process
161 Dead Air
161 ID Process
161 Dead Air
161 ID Process
161 Resolution
162 Diagnosis
162 ID Process
162 Dead Air
162 ID Process
162 Resolution
163 Diagnosis
163 ID Process
163 Resolution
163 Dead Air
163 Resolution
164 Diagnosis
164 ID Process
164 Investigation
164 On Hold
164 Resolution
165 Diagnosis
165 ID Process
165 Investigation
165 Resolution
166 Diagnosis
166 ID Process
166 Dead Air
166 ID Process
166 Resolution
166 On Hold
166 Resolution
166 On Hold
167 Diagnosis
167 ID Process
167 Dead Air
167 ID Process
167 Resolution
168 Diagnosis
168 ID Process
168 Investigation
168 Dead Air
168 Investigation
168 Resolution
168 On Hold
169 Diagnosis
169 Resolution
169 Investigation
169 ID Process

Expected Result:

My Attempt:

df.groupby(["ID"])["Activity"].apply(lambda x : (x!=x.shift()).cumsum())

This counts the number of categorical activities per case but not their instance which can be a feature for my process mining but not the one I'm trying to achieve

score 0 · Accepted Answer · answered May 19 '20 at 11:24

0

Use:

g = df.groupby(["ID", "Activity"]).cumcount().add(1)
print(g.to_string(index=False))

This prints:

answered May 19 '20 at 11:24

Shubham Sharma

68,127
6
24
53

Why not close dupe? – jezrael May 19 '20 at 11:26
@jezrael yeah.. – Shubham Sharma May 19 '20 at 11:29
if some rare dupe, I dont care, but this? – jezrael May 19 '20 at 11:29

Running Count per Categorical Value with two columns

1 Answers1