5

I want to use rolling time windows in pandas for forward looking windows. How would I do that?

import pandas as pd

data = pd.DataFrame({'t': ['2017-02-02 15:00:01',
                 '2017-02-02 15:00:02',
                 '2017-02-02 15:01:00',
                 '2017-02-02 15:03:05',
                 '2017-02-02 15:08:00'],
        'value': [1, 2, 3, 14, 5]})
data['t'] = data['t'].apply(pd.to_datetime)
data = data.set_index('t')

backward_max = data.rolling('300s').max() # yields [1, 2, 3, 14, 14]

forward_max = '???' # should yield [14, 14, 14, 14, 5]

Problem is that because of unequal time differences between observations, I cannot simply shift the result from the backward looking windows.

I could calculate fake time-stamps that go in reverse order and use backward time windows, as could be derived from here, but I am almost certain there is a more elegant way.

Edit: Changed all timestamps to the same day and provide the following workaround code, as inelegant as it is.

data2 = data.reset_index()
data2['t2'] = max(data2['t']) - (data2['t'] - min(data2['t']))
data2 = data2.set_index('t2').sort_index()
forward_max = np.flip(data2.rolling('300s')['value'].max().values, axis=0)
user1965813
  • 671
  • 5
  • 16
  • why last one is 5 ? rather than 14 ? – BENY Feb 15 '18 at 15:18
  • 1
    The values for the points are [1, 2, 3, 14, 5]. A forward looking window starting at the last point only contains that one point, so the maximum of the values is the value of the last point, which is 5. – user1965813 Feb 15 '18 at 15:23
  • Time-based window shifts (or backwards rolling) is definitely a missing feature in pandas. Fake timestamps was the only way I could manage to do it efficiently (for a generic, non-equispaced data). – M0nZDeRR Aug 20 '19 at 13:57

3 Answers3

0

Based on your example, you could reverse the data frame/series, calculate your moving maximum, and reverse the results back. At least your example prediction can be achieved this way. (Tested with Pandas v2.0.2)

forward_max = data[::-1].rolling('300s').max()[::-1]
print(forward_max)
# Output:
#                      value
# t                         
# 2017-02-02 15:00:01   14.0
# 2017-02-02 15:00:02   14.0
# 2017-02-02 15:01:00   14.0
# 2017-02-02 15:03:05   14.0
# 2017-02-02 15:08:00    5.0
Steffen R.
  • 96
  • 5
-1

You can refer to this solution: Python: look back n days rolling standard deviation

However, it is still not a perfect solution. Somebody has raised an issue at Github, but it seems no one is working on it. Here is the link https://github.com/pandas-dev/pandas/issues/6772

luoshao23
  • 391
  • 2
  • 14
  • 1
    As far as I understand the solution they propose, they have regular time intervals, which is not the case in my situation. – user1965813 Jun 01 '18 at 08:22
-1

A possible solution is to shift the rolling window aggregation

  • 1
    As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Jan 11 '22 at 20:03
  • There is no way to shift time series by time amount. Your answer is incorrect. – mikkom Mar 24 '23 at 07:14