how to do forward rolling sum in pandas?

Question

I have this dataframe:

dates = pd.date_range(start='2016-01-01', periods=20, freq='d')
df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8,
                   'B': np.concatenate((dates, dates)),
                   'C': np.arange(40)})

I sorted the data frame by Date:

df.sort_values('B',inplace=True)

I am looking to do a forward rolling sum on date. However, I can only do backward rolling sum using:

df.groupby('A').rolling(7, on='B',min_periods=0).C.sum()

A  B         
1  2016-01-01      0.0
   2016-01-02      1.0
   2016-01-03      3.0
   2016-01-04      6.0
   2016-01-05     10.0
   2016-01-06     15.0

I want to do forward rolling sum.

Please provide an expected output. – galmeriol May 18 '18 at 12:46 — galmeriol, May 18 '18 at 12:46

score 22 · Accepted Answer · answered May 18 '18 at 12:14

22

I believe need change ordering by iloc[::-1]:

df1 = (df.iloc[::-1]
        .groupby('A', sort=False)
        .rolling(7, on='B',min_periods=0).C
        .sum()
        .iloc[::-1])

answered May 18 '18 at 12:14

jezrael

822,522
95
1,334
1,252

Hi jez I checked your solution It worked perfectly well Thank you man – Ahamed Moosa May 18 '18 at 13:08
@AhamedMoosa feel free to upvote any answer you found helpful including the one you just accepted. – piRSquared May 18 '18 at 13:11
@jezrael Duly Obliged :) – Ahamed Moosa May 18 '18 at 13:20
1

Simpler and faster to use `df[::-1]` in place of `df.iloc[::-1]`. – Attila the Fun Jul 28 '21 at 20:49

score 9 · Answer 2 · edited Feb 17 '21 at 16:59

9

Simple as:

df['B'] = df['A'].rolling(3).sum().shift(-3)

edited Feb 17 '21 at 16:59

Suraj Rao

29,388
11
94
103

answered Feb 17 '21 at 16:57

Daniel Manso

91
1
2

2

this will return 3 empty rows. – Muhammad Usman Sep 07 '21 at 17:06
1

Wow! I think this was a very simple and elegant solution of getting a forward rolling operation, thanks! For those who wish to use this method, I would like to point out that they might need to increment the shifting value by one (in absolute value) if the rolling function does not include the data at that row. For example, if you uses a 'closed' parameter of 'left' or 'neither' for '.rolling()', then the data at the same row is not included in the rolling function; and in that case, you need to use '.shift(-4)' to shift the data one row further to exclude the original row. – hekimgil Mar 08 '22 at 22:10

piRSquared · Answer 3 · 2018-05-18T12:53:25.917

Setup

dates = pd.date_range(start='2016-01-01', periods=20, freq='d')
df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8,
                   'B': np.concatenate((dates, dates)),
                   'C': np.arange(40)})

Sort by `'B'` then when we roll, roll the reverse with `iloc[::-1]`

def rev_roll(x):
    return x.iloc[::-1].rolling(7, min_periods=0).sum().iloc[::-1]

df.assign(Roll=df.sort_values('B').groupby('A').C.transform(rev_roll))

Output

    A          B   C  Roll
0   1 2016-01-01   0    21
1   1 2016-01-02   1    28
2   1 2016-01-03   2    35
3   1 2016-01-04   3    42
4   1 2016-01-05   4    49
5   1 2016-01-06   5    56
6   1 2016-01-07   6    63
7   1 2016-01-08   7    70
8   1 2016-01-09   8    77
9   1 2016-01-10   9    84
10  1 2016-01-11  10    91
11  1 2016-01-12  11    98
12  1 2016-01-13  12   105
13  1 2016-01-14  13   112
14  1 2016-01-15  14    99
15  1 2016-01-16  15    85
16  1 2016-01-17  16    70
17  1 2016-01-18  17    54
18  1 2016-01-19  18    37
19  1 2016-01-20  19    19
20  2 2016-01-01  20   161
21  2 2016-01-02  21   168
22  2 2016-01-03  22   175
23  2 2016-01-04  23   182
24  2 2016-01-05  24   189
25  2 2016-01-06  25   196
26  2 2016-01-07  26   171
27  2 2016-01-08  27   145
28  2 2016-01-09  28   118
29  2 2016-01-10  29    90
30  2 2016-01-11  30    61
31  2 2016-01-12  31    31
32  3 2016-01-13  32   245
33  3 2016-01-14  33   252
34  3 2016-01-15  34   219
35  3 2016-01-16  35   185
36  3 2016-01-17  36   150
37  3 2016-01-18  37   114
38  3 2016-01-19  38    77
39  3 2016-01-20  39    39

The use of transform is a good one if you want to add the new column to the original data frame. — Jia Gao, Mar 04 '19 at 17:38

score 3 · Answer 4 · answered May 18 '18 at 12:46

3

I think you want

df["C"] = df["A"].cumsum()

See documentation here

answered May 18 '18 at 12:46

Toby Petty

4,431
1
17
29

score 1 · Answer 5 · answered Mar 29 '20 at 07:34

If your dates aren't fully continuous (like you've got a missing day or two here and there) and you want a fixed N-Days window (and not N-records window), you can use:

def forward_rolling_mean(sub_df, col='units', days_ahead=7):
    rolling_data = [sub_df[sub_df['date'].between(date+pd.Timedelta(days=1), date+pd.Timedelta(days=1+days_ahead-1))][col].mean() for date in sub_df['date']]
    return pd.DataFrame({'%s_next%idays_mean' % (col, days_ahead): rolling_data}, index=sub_df['date'])

you can also change it to return a series instead of dataframe. later on you can join this to the original data.

score 1 · Answer 6 · answered Aug 21 '22 at 11:36

1

indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=7)
df.groupby('A').rolling(window=indexer, on='B',min_periods=0).C.sum()

pandas doc

answered Aug 21 '22 at 11:36

eXcellme

11
3

how to do forward rolling sum in pandas?

6 Answers6

Setup

Sort by `'B'` then when we roll, roll the reverse with `iloc[::-1]`

Output

Linked

how to do forward rolling sum in pandas?

6 Answers6

Setup

Sort by 'B' then when we roll, roll the reverse with iloc[::-1]

Output

Linked

Sort by `'B'` then when we roll, roll the reverse with `iloc[::-1]`