17

I have this dataframe:

dates = pd.date_range(start='2016-01-01', periods=20, freq='d')
df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8,
                   'B': np.concatenate((dates, dates)),
                   'C': np.arange(40)})

I sorted the data frame by Date:

df.sort_values('B',inplace=True)

I am looking to do a forward rolling sum on date. However, I can only do backward rolling sum using:

df.groupby('A').rolling(7, on='B',min_periods=0).C.sum()

A  B         
1  2016-01-01      0.0
   2016-01-02      1.0
   2016-01-03      3.0
   2016-01-04      6.0
   2016-01-05     10.0
   2016-01-06     15.0

I want to do forward rolling sum.

Community
  • 1
  • 1
Ahamed Moosa
  • 1,395
  • 7
  • 16
  • 30

6 Answers6

22

I believe need change ordering by iloc[::-1]:

df1 = (df.iloc[::-1]
        .groupby('A', sort=False)
        .rolling(7, on='B',min_periods=0).C
        .sum()
        .iloc[::-1])
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
9

Simple as:

df['B'] = df['A'].rolling(3).sum().shift(-3)
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
Daniel Manso
  • 91
  • 1
  • 2
  • 2
    this will return 3 empty rows. – Muhammad Usman Sep 07 '21 at 17:06
  • 1
    Wow! I think this was a very simple and elegant solution of getting a forward rolling operation, thanks! For those who wish to use this method, I would like to point out that they might need to increment the shifting value by one (in absolute value) if the rolling function does not include the data at that row. For example, if you uses a 'closed' parameter of 'left' or 'neither' for '.rolling()', then the data at the same row is not included in the rolling function; and in that case, you need to use '.shift(-4)' to shift the data one row further to exclude the original row. – hekimgil Mar 08 '22 at 22:10
7

Setup

dates = pd.date_range(start='2016-01-01', periods=20, freq='d')
df = pd.DataFrame({'A': [1] * 20 + [2] * 12 + [3] * 8,
                   'B': np.concatenate((dates, dates)),
                   'C': np.arange(40)})

Sort by 'B' then when we roll, roll the reverse with iloc[::-1]

def rev_roll(x):
    return x.iloc[::-1].rolling(7, min_periods=0).sum().iloc[::-1]

df.assign(Roll=df.sort_values('B').groupby('A').C.transform(rev_roll))

Output


    A          B   C  Roll
0   1 2016-01-01   0    21
1   1 2016-01-02   1    28
2   1 2016-01-03   2    35
3   1 2016-01-04   3    42
4   1 2016-01-05   4    49
5   1 2016-01-06   5    56
6   1 2016-01-07   6    63
7   1 2016-01-08   7    70
8   1 2016-01-09   8    77
9   1 2016-01-10   9    84
10  1 2016-01-11  10    91
11  1 2016-01-12  11    98
12  1 2016-01-13  12   105
13  1 2016-01-14  13   112
14  1 2016-01-15  14    99
15  1 2016-01-16  15    85
16  1 2016-01-17  16    70
17  1 2016-01-18  17    54
18  1 2016-01-19  18    37
19  1 2016-01-20  19    19
20  2 2016-01-01  20   161
21  2 2016-01-02  21   168
22  2 2016-01-03  22   175
23  2 2016-01-04  23   182
24  2 2016-01-05  24   189
25  2 2016-01-06  25   196
26  2 2016-01-07  26   171
27  2 2016-01-08  27   145
28  2 2016-01-09  28   118
29  2 2016-01-10  29    90
30  2 2016-01-11  30    61
31  2 2016-01-12  31    31
32  3 2016-01-13  32   245
33  3 2016-01-14  33   252
34  3 2016-01-15  34   219
35  3 2016-01-16  35   185
36  3 2016-01-17  36   150
37  3 2016-01-18  37   114
38  3 2016-01-19  38    77
39  3 2016-01-20  39    39
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • The use of transform is a good one if you want to add the new column to the original data frame. – Jia Gao Mar 04 '19 at 17:38
3

I think you want

df["C"] = df["A"].cumsum()

See documentation here

Toby Petty
  • 4,431
  • 1
  • 17
  • 29
1

If your dates aren't fully continuous (like you've got a missing day or two here and there) and you want a fixed N-Days window (and not N-records window), you can use:

def forward_rolling_mean(sub_df, col='units', days_ahead=7):
    rolling_data = [sub_df[sub_df['date'].between(date+pd.Timedelta(days=1), date+pd.Timedelta(days=1+days_ahead-1))][col].mean() for date in sub_df['date']]
    return pd.DataFrame({'%s_next%idays_mean' % (col, days_ahead): rolling_data}, index=sub_df['date'])

you can also change it to return a series instead of dataframe. later on you can join this to the original data.

Idan Richman
  • 633
  • 5
  • 7
1
indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=7)
df.groupby('A').rolling(window=indexer, on='B',min_periods=0).C.sum()

pandas doc

eXcellme
  • 11
  • 3