Pandas: Sum Previous N Rows by Group

Question

I want to sum the prior N periods of data for each group. I have seen how to do each individually (sum by group, or sum prior N periods), but can't figure out a clean way to do both together.

I'm currently doing the following:

import pandas as pd

sample_data = {'user': ['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'b'],\
                'clicks': [0,1,2,3,4,5,6,7,8,9]}
df = pd.DataFrame(sample_data)
df['clicks.1'] = df.groupby(['user'])['clicks'].shift(1)
df['clicks.2'] = df.groupby(['user'])['clicks'].shift(2)
df['clicks.3'] = df.groupby(['user'])['clicks'].shift(3)
df['total_clicks_prior3'] = df[['clicks.1','clicks.2', 'clicks.3']].sum(axis=1)

I don't want the 3 intermediate lagged columns, I just want the sum of those, so my desired output is:

>>> df[['clicks','user','total_clicks_prior3']]
   clicks user  total_clicks_prior3
0       0    a                  NaN
1       1    a                  0.0
2       2    a                  1.0
3       3    a                  3.0
4       4    a                  6.0
5       5    b                  NaN
6       6    b                  5.0
7       7    b                 11.0
8       8    b                 18.0
9       9    b                 21.0

Note: I could obviously drop the 3 columns after creating them, but given that I will be creating multiple columns of different numbers of lagged periods, I feel like there has to be an easier way.

In your example, you don't seem to take into account the current row, but instead use the previous 3, which means the last row is ignored completely in results. Is this intended? — user3483203, Sep 04 '19 at 17:55
Yes, that is intended. In the real world I suppose I actually may not have the value for the current observation - good catch. — Tchotchke, Sep 04 '19 at 17:59

score 7 · Accepted Answer · answered Sep 04 '19 at 17:50

This is groupby + rolling + shift

df.groupby('user')['clicks'].rolling(3, min_periods=1).sum().groupby(level=0).shift()

user
a     0     NaN
      1     0.0
      2     1.0
      3     3.0
      4     6.0
b     5     NaN
      6     5.0
      7    11.0
      8    18.0
      9    21.0
Name: clicks, dtype: float64

score 1 · Answer 2 · answered Sep 04 '19 at 17:57

If you have a solution that works for each group, you can use apply to use it on the groupby object. For instance, you linked to a question that has df['A'].rolling(min_periods=1, window=11).sum() as an answer. If that does what you want on the subgroups, you can do

df.groupby('user').apply(lambda x: x['clicks'].rolling(min_periods=1, window=11).sum())

Pandas: Sum Previous N Rows by Group

2 Answers2