Working with timeseries in pandas
, I try to extract the mean over a window that slides over the data. I'm sorry for leaving the many parameters (period_full
, period_lookback
, period_lookahead
, offset
) inside the following minimal working example. They will become significant to the question. To make the question easier to read, I will briefly explain what the parameters do:
The function can take an argument period_full
(of type Timedelta
), the length of the sliding window. By default, the window is centered. If the optional argument offset
is given, then the window is shifted to the left (if offset
is negative) or to the right (if offset
is positive).
The Timedelta
s period_lookback
and period_lookahead
are computed internally from period_full
and offset
. Alternatively, they can be passed to the function; then period_full
needs not be passed.
This is the code:
import pandas as pd
def sliding_mean(
ts: pd.Series,
period_full: pd.Timedelta =None,
period_lookback: pd.Timedelta =None,
period_lookahead: pd.Timedelta =None,
offset: pd.Timedelta =None,
) -> pd.core.series.Series:
if (period_full == None):
assert(period_lookback is not None and period_lookahead is not None), 'Parameter period_full was not passed. In this case, the parameters period_lookback and period_lookahead are mandatory.'
else:
period_lookback = period_full / 2
period_lookahead = period_full / 2
if (offset is not None):
period_lookback -= offset
period_lookahead += offset
mean_list = []
for time_stamp in ts.index:
feature_list.append(
ts[
time_stamp-period_lookback : time_stamp+period_lookahead
].mean()
ser = pd.Series(mean_list, index = ts.index)
return ser
This works, but it is very slow. Probably because I make a list and don't vectorize the computation. I thought that the method .rolling()
might do this for me. However, .rolling()
expects an offset
rather than a Timedelta
. The following is the altered code:
import pandas as pd
def sliding_mean(
ts: pd.Series,
period_full: pd.Timedelta =None,
period_lookback: pd.Timedelta =None,
period_lookahead: pd.Timedelta =None,
offset: pd.Timedelta =None,
) -> pd.core.series.Series:
if (period_full == None):
assert(period_lookback is not None and period_lookahead is not None), 'Parameter period_full was not passed. In this case, the parameters period_lookback and period_lookahead are mandatory.'
else:
period_lookback = period_full / 2
period_lookahead = period_full / 2
if (offset is not None):
period_lookback -= offset
period_lookahead += offset
# Convert the pd.Timedeltas to pd.DateOffsets. These can be used in the function .rolling(), whereas usage of Timedeltas is deprecated.
period_lookback = pd.DateOffset(seconds=period_lookback.total_seconds())
period_lookahead = pd.DateOffset(seconds=period_lookahead.total_seconds())
period_full = pd.DateOffset(seconds=period_full.total_seconds())
offset = pd.DateOffset(seconds=period_full.total_seconds())
rolled = ts.rolling(window=period_full, closed='both', center=True)
ser = rolled.mean()
return ser
This is quite satisfactory, just that I didn't find a way to
- add an offset (instead, the window is always centered, because of
center=True
) - use
period_lookback
andperiod_lookahead
directly.
While the second would be OK, I need the first functionality for my use-case.
Is it possible to pass more detailed information over to ts.rolling()
?