1

Working with timeseries in pandas, I try to extract the mean over a window that slides over the data. I'm sorry for leaving the many parameters (period_full, period_lookback, period_lookahead, offset) inside the following minimal working example. They will become significant to the question. To make the question easier to read, I will briefly explain what the parameters do:

The function can take an argument period_full (of type Timedelta), the length of the sliding window. By default, the window is centered. If the optional argument offset is given, then the window is shifted to the left (if offset is negative) or to the right (if offset is positive).
The Timedeltas period_lookback and period_lookahead are computed internally from period_full and offset. Alternatively, they can be passed to the function; then period_full needs not be passed.

This is the code:

import pandas as pd

def sliding_mean(
    ts: pd.Series,
    period_full: pd.Timedelta =None,
    period_lookback: pd.Timedelta =None,
    period_lookahead: pd.Timedelta =None,
    offset: pd.Timedelta =None,
) -> pd.core.series.Series:

    if (period_full == None):
        assert(period_lookback is not None and period_lookahead is not None), 'Parameter period_full was not passed. In this case, the parameters period_lookback and period_lookahead are mandatory.'
    else:
        period_lookback  = period_full / 2
        period_lookahead = period_full / 2

    if (offset is not None):
        period_lookback  -= offset
        period_lookahead += offset

    mean_list = []
    for time_stamp in ts.index:
        feature_list.append(
            ts[
                time_stamp-period_lookback : time_stamp+period_lookahead
            ].mean()
    
    ser = pd.Series(mean_list, index = ts.index)
    return ser

This works, but it is very slow. Probably because I make a list and don't vectorize the computation. I thought that the method .rolling() might do this for me. However, .rolling() expects an offset rather than a Timedelta. The following is the altered code:

import pandas as pd

def sliding_mean(
    ts: pd.Series,
    period_full: pd.Timedelta =None,
    period_lookback: pd.Timedelta =None,
    period_lookahead: pd.Timedelta =None,
    offset: pd.Timedelta =None,
) -> pd.core.series.Series:

    if (period_full == None):
        assert(period_lookback is not None and period_lookahead is not None), 'Parameter period_full was not passed. In this case, the parameters period_lookback and period_lookahead are mandatory.'
    else:
        period_lookback  = period_full / 2
        period_lookahead = period_full / 2

    if (offset is not None):
        period_lookback  -= offset
        period_lookahead += offset

    # Convert the pd.Timedeltas to pd.DateOffsets. These can be used in the function .rolling(), whereas usage of Timedeltas is deprecated.
    period_lookback = pd.DateOffset(seconds=period_lookback.total_seconds())
    period_lookahead = pd.DateOffset(seconds=period_lookahead.total_seconds())
    period_full = pd.DateOffset(seconds=period_full.total_seconds())
    offset = pd.DateOffset(seconds=period_full.total_seconds())

    rolled = ts.rolling(window=period_full, closed='both', center=True)
    ser = rolled.mean()

    return ser

This is quite satisfactory, just that I didn't find a way to

  • add an offset (instead, the window is always centered, because of center=True)
  • use period_lookback and period_lookahead directly.

While the second would be OK, I need the first functionality for my use-case.

Is it possible to pass more detailed information over to ts.rolling()?

NerdOnTour
  • 634
  • 4
  • 15
  • Ever so sorry, I just found https://stackoverflow.com/questions/54660534/rolling-mean-with-time-offset-pandas?rq=1 If that thread solves my question, I will delete this one here. Edit: The referenced question is facing a slightly different problem; moreover, the answer given there does not help in my case. – NerdOnTour Dec 21 '21 at 16:22
  • is there a way you could provide some examples of what you are trying to achieve? Are the time series evenly sampled, or irregular? Are there corner cases that we should be aware of? – Pierre D Dec 21 '21 at 19:13
  • I had hoped for a solution that can handle irregularly sampled time series, but I could arrange for evenly sampled time series if a solution needs that. (I suppose you might come up with a NumPy solution now ...) No corner cases come to my mind right now. – NerdOnTour Dec 21 '21 at 20:20

0 Answers0