26

Suppose I have a time series:

In[138] rng = pd.date_range('1/10/2011', periods=10, freq='D')
In[139] ts = pd.Series(randn(len(rng)), index=rng)
In[140]
Out[140]:
2011-01-10    0
2011-01-11    1
2011-01-12    2
2011-01-13    3
2011-01-14    4
2011-01-15    5
2011-01-16    6
2011-01-17    7
2011-01-18    8
2011-01-19    9
Freq: D, dtype: int64

If I use one of the rolling_* functions, for instance rolling_sum, I can get the behavior I want for backward looking rolling calculations:

In [157]: pd.rolling_sum(ts, window=3, min_periods=0)
Out[157]: 
2011-01-10     0
2011-01-11     1
2011-01-12     3
2011-01-13     6
2011-01-14     9
2011-01-15    12
2011-01-16    15
2011-01-17    18
2011-01-18    21
2011-01-19    24
Freq: D, dtype: float64

But what if I want to do a forward-looking sum? I've tried something like this:

In [161]: pd.rolling_sum(ts.shift(-2, freq='D'), window=3, min_periods=0)
Out[161]: 
2011-01-08     0
2011-01-09     1
2011-01-10     3
2011-01-11     6
2011-01-12     9
2011-01-13    12
2011-01-14    15
2011-01-15    18
2011-01-16    21
2011-01-17    24
Freq: D, dtype: float64

But that's not exactly the behavior I want. What I am looking for as an output is:

2011-01-10    3
2011-01-11    6
2011-01-12    9
2011-01-13    12
2011-01-14    15
2011-01-15    18
2011-01-16    21
2011-01-17    24
2011-01-18    17
2011-01-19    9

ie - I want the sum of the "current" day plus the next two days. My current solution is not sufficient because I care about what happens at the edges. I know I could solve this manually by setting up two additional columns that are shifted by 1 and 2 days respectively and then summing the three columns, but there's got to be a more elegant solution.

user2543645
  • 649
  • 1
  • 10
  • 15
  • I found [this thread](https://groups.google.com/forum/#!msg/pydata/LLPnzMfDlSg/OpK-mHJzJwYJ) from about a year ago, so it looks like there isn't "official" support yet. But the interim solution posted there also doesn't do what I want at the edges (18th/19th in my example are still blank) – user2543645 Apr 02 '14 at 18:54
  • Pandas 1.1 added "support" for forward looking window ops with [FixedForwardWindowIndexer](https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.api.indexers.FixedForwardWindowIndexer.html#pandas.api.indexers.FixedForwardWindowIndexer), but when I tried applying it to this data my interpreter segfaulted... perhaps this doesn't support forward datetime windows yet but will eventually. – cs95 Aug 03 '20 at 09:20

5 Answers5

31

Why not just do it on the reversed Series (and reverse the answer):

In [11]: pd.rolling_sum(ts[::-1], window=3, min_periods=0)[::-1]
Out[11]:
2011-01-10     3
2011-01-11     6
2011-01-12     9
2011-01-13    12
2011-01-14    15
2011-01-15    18
2011-01-16    21
2011-01-17    24
2011-01-18    17
2011-01-19     9
Freq: D, dtype: float64
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • 2
    Clever answer, thanks. Although this only works when you have all dates in a range. In your case you didn't specify a frequency. When you do specify, for instance, freq='D', this solution no longer works. So if I had missing data or a variable sampling frequency this wouldn't work. It seems like this would be a really valuable feature to implement properly. Does anyone know if it's in the pipeline? – user2543645 Apr 03 '14 at 00:02
  • @user2543645 I created an issue about this [here](https://github.com/pydata/pandas/issues/6772). I'd not realised this wouldn't work in some cases, interesting. Could you resample first (to get consistent freq) and then rolling_ in reverse? – Andy Hayden Apr 03 '14 at 01:11
  • I think that should work, but only if you resample in the frequency that you want to aggregate on. If you have sparse data, this could dramatically increase your set size. For instance, if you have a few days worth of hourly observations collected in different years, you'd have to add in thousands of null values. It would be nice if there was a more native support that truly behaved the same way that rolling_* does, just in the other direction. – user2543645 Apr 03 '14 at 02:40
9

I struggled with this then found an easy way using shift.

If you want a rolling sum for the next 10 periods, try:

df['NewCol'] = df['OtherCol'].shift(-10).rolling(10, min_periods = 0).sum()

We use shift so that "OtherCol" shows up 10 rows ahead of where it normally would be, then we do a rolling sum over the previous 10 rows. Because we shifted, the previous 10 rows are actually the future 10 rows of the unshifted column. :)

9

Pandas recently added a new feature which enables you to implement forward looking rolling. You have to upgrade to pandas 1.1.0 to get the new feature.

indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=3)
ts.rolling(window=indexer, min_periods=1).sum()
ort
  • 199
  • 1
  • 3
  • 1
    this only works if you have a complete series of uninterrupted dates.You cannot use the '#D' window that rolling() can use – geominded Feb 19 '21 at 22:43
1

Maybe you can try bottleneck module. When ts is large, bottleneck is much faster than pandas

import bottleneck as bn
result = bn.move_sum(ts[::-1], window=3, min_count=1)[::-1]

And bottleneck has other rolling functions, such as move_max, move_argmin, move_rank.

William
  • 4,258
  • 2
  • 23
  • 20
0

Try this one for a rolling window of 3:

window = 3
ts.rolling(window).sum().shift(-window + 1)