3

The reason I am asking is the following:

I have a time-series with 15 minute OHLC financial data. What I would like to calculate is given any data point what was the maximum and the minimum price in the future (hence forward looking) for a specific time-frame following that data point. The exact rows pertaining to that time frame are not fixed either in number of rows or in offset position.

Here is an example to illustrate what I mean.

Let's say I have open high low close for a 23.45 Tuesday 28th of April. I would like to know the max and the min for the periods:

  1. one-day ahead, so it would need to group all Wednesday 29th of April data
  2. one-hour ahead, so it would be next 4 rows
  3. one-week ahead, so it would be all 672 rows (4 rows x24 hours x 7 days) beginning with 4th of May, which would be Monday, hence next week.

As you can see the function would "need to know" how its current time position (in a day, week, month) determines the window I'm interested (which is forward-looking and offset by a sliding variable).

Is there a way to do this without resorting to for-loops and custom functions? Thanks!

rioZg
  • 530
  • 1
  • 6
  • 17

2 Answers2

2

df.rolling can accept a string frequency offset as its first argument. For example,

import numpy as np
import pandas as pd
np.random.seed(2018)

# Generate a DataFrame with an irregular DatetimeIndex
N = 20
start = np.datetime64('2018-01-01').astype('M8[s]').view('<i8')
end = np.datetime64('2018-02-01').astype('M8[s]').view('<i8')
timestamps = np.random.uniform(start, end, size=N)
timestamps.sort()
index = timestamps.astype('M8[s]')

df = pd.DataFrame(np.random.randint(10, size=(N, 4)), columns=list('OHLC'),
                  index=index)

This computes a rolling mean using a 2-day window size:

df.rolling('2D').mean()

This computes a rolling mean using a 7-day (i.e. weekly) window size:

df.rolling('7D').mean()

Use 1H for a 1-hour window, 1D for a 1-day window, and 7D for a 1-week window.

The number of rows corresponding to the rolling window need not be constant.


To check that the above code is producing the desired result, let's confirm the last two rows of df.rolling('7D').mean().

In [91]: df.rolling('7D').mean().tail(2)
Out[91]: 
                            O         H    L         C
2018-01-30 05:22:18  4.285714  3.000000  5.0  3.428571
2018-01-31 23:45:18  3.833333  2.833333  4.5  3.166667

The last row corresponds to means taken over this 7-day DataFrame:

In [93]: end = df.index[-1]; window = df.loc[end-pd.Timedelta(days=7):end]; window
Out[93]: 
                     O  H  L  C
2018-01-25 21:17:07  1  2  1  2
2018-01-26 22:48:38  6  0  3  1
2018-01-28 08:28:04  0  8  7  5
2018-01-29 02:48:53  8  0  2  3
2018-01-30 05:22:18  6  0  8  8
2018-01-31 23:45:18  2  7  6  0

In [94]: window.mean()
Out[94]: 
O    3.833333
H    2.833333
L    4.500000
C    3.166667
dtype: float64

The values in window.mean() match the values in the last row of df.rolling('7D').mean().

Similarly, we can confirm the result in the second to last row by setting end = df.index[-2]:

In [95]: end = df.index[-2]; window = df.loc[end-pd.Timedelta(days=7):end]; window
Out[95]: 
                     O  H  L  C
2018-01-23 12:05:33  9  8  9  4
2018-01-24 11:16:36  0  3  5  1
2018-01-25 21:17:07  1  2  1  2
2018-01-26 22:48:38  6  0  3  1
2018-01-28 08:28:04  0  8  7  5
2018-01-29 02:48:53  8  0  2  3
2018-01-30 05:22:18  6  0  8  8

In [96]: window.mean()
Out[96]: 
O    4.285714
H    3.000000
L    5.000000
C    3.428571
dtype: float64

In [99]: window.mean().equals(df.rolling('7D').mean().loc[end])
Out[99]: True

Notice that the actual number of rows in the windows differ (6 vs 7).

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • This however does not answer the question which asks for a way how to calculate the rolling window using offset going FORWARD in time (as df.rolling('-7D') doesn't work). I found the answer elsewhere - posting it below.. – PGlivi Jul 07 '20 at 08:11
1

According to this answer, temporarily reversing the time series before using the rolling function works.

Here is an example that uses this idea (can be used with any timeseries, offset and aggregating function just the same):

Let's generate some random irregularly indexed time series:

import pandas as pd
import numpy as np
    
length = 15
# generate 15 unique days within 90 days from '2020-01-01'  
dates = np.datetime64('2020-01-01') + np.random.choice(90, size = length, replace = False)
ts = pd.Series(np.random.randint(0,9, size = length), index = dates).sort_index()

In[1]: ts
Out[1]: 
2020-01-04    7
2020-01-10    2
2020-01-12    4
2020-01-19    8
2020-02-04    3
2020-02-05    8
2020-02-07    5
2020-02-19    7
2020-02-24    6
2020-02-25    4
2020-03-01    8
2020-03-04    0
2020-03-14    6
2020-03-15    7
2020-03-28    6
dtype: int32

Now to find the weekly maxima within the 1 week offset FORWARD we just need to use the rolling function with the '7D' offset on the reversed series and then reverse the result back:

In[2]: ts[::-1].rolling(window = '7D').max()[::-1]
Out[2]: 
2020-01-04    7.0
2020-01-10    4.0
2020-01-12    4.0
2020-01-19    8.0
2020-02-04    8.0
2020-02-05    8.0
2020-02-07    5.0
2020-02-19    7.0
2020-02-24    8.0
2020-02-25    8.0
2020-03-01    8.0
2020-03-04    0.0
2020-03-14    7.0
2020-03-15    7.0
2020-03-28    6.0
PGlivi
  • 996
  • 9
  • 12