A similar approach but slightly more generic, we can encode c2
with pd.get_dummies
then take the groupby cumsum
relative to c1
then we can join
the newly created columns back to the original DataFrame:
df = df.join(
pd.get_dummies(df, columns=['c2'], prefix='Ct').groupby('c1').cumsum()
)
Alternatively str.get_dummies
can be used instead and pass the grouping Series directly to groupby:
df = df.join(
df['c2'].str.get_dummies().groupby(df['c1']).cumsum().add_prefix('Ct_')
)
df
:
c1 c2 Ct_X Ct_Y
0 A X 1 0
1 A X 2 0
2 A Y 2 1
3 A Y 2 2
4 B X 1 0
5 B X 2 0
6 B X 3 0
7 B Y 3 1
8 C X 1 0
9 C Y 1 1
10 C Y 1 2
11 C Y 1 3
How does this work?
pd.get_dummies
produces "dummy/indicator variables" for all values in c2
, resulting in:
pd.get_dummies(df, columns=['c2'], prefix='Ct')
c1 Ct_X Ct_Y
0 A 1 0
1 A 1 0
2 A 0 1
3 A 0 1
4 B 1 0
5 B 1 0
6 B 1 0
7 B 0 1
8 C 1 0
9 C 0 1
10 C 0 1
11 C 0 1
Now since the indicators are already in 1/0 format we can simply take the cumulative total for each group in c1
resulting in the desired new columns:
pd.get_dummies(df, columns=['c2'], prefix='Ct').groupby('c1').cumsum()
Ct_X Ct_Y
0 1 0
1 2 0
2 2 1
3 2 2
4 1 0
5 2 0
6 3 0
7 3 1
8 1 0
9 1 1
10 1 2
11 1 3
Notes
- This approach will calculate the cumulative total for all unique values in
c2
per c1
group.
- If using
pd.get_dummies
on a DataFrame with more columns than just c1
and c2
the DataFrame may need subset like:
pd.get_dummies(df[['c1', 'c2']], columns=['c2'], prefix='Ct').groupby('c1').cumsum()
Setup:
import pandas as pd
df = pd.DataFrame({
'c1': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C'],
'c2': ['X', 'X', 'Y', 'Y', 'X', 'X', 'X', 'Y', 'X', 'Y', 'Y', 'Y']
})