I want to sum the prior N periods of data for each group. I have seen how to do each individually (sum by group, or sum prior N periods), but can't figure out a clean way to do both together.
I'm currently doing the following:
import pandas as pd
sample_data = {'user': ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b'],\
'clicks': [0,1,2,3,4,5,6,7,8,9]}
df = pd.DataFrame(sample_data)
df['clicks.1'] = df.groupby(['user'])['clicks'].shift(1)
df['clicks.2'] = df.groupby(['user'])['clicks'].shift(2)
df['clicks.3'] = df.groupby(['user'])['clicks'].shift(3)
df['total_clicks_prior3'] = df[['clicks.1','clicks.2', 'clicks.3']].sum(axis=1)
I don't want the 3 intermediate lagged columns, I just want the sum of those, so my desired output is:
>>> df[['clicks','user','total_clicks_prior3']]
clicks user total_clicks_prior3
0 0 a NaN
1 1 a 0.0
2 2 a 1.0
3 3 a 3.0
4 4 a 6.0
5 5 b NaN
6 6 b 5.0
7 7 b 11.0
8 8 b 18.0
9 9 b 21.0
Note: I could obviously drop the 3 columns after creating them, but given that I will be creating multiple columns of different numbers of lagged periods, I feel like there has to be an easier way.