17

I am trying to set center=True in pandas rolling function, for a time-series:

import pandas as pd
series = pd.Series(1, index = pd.date_range('2014-01-01', '2014-04-01', freq = 'D'))
series.rolling('7D', min_periods=1, center=True, closed='left')

But output is:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-6-6b30c16a2d12> in <module>()
      1 import pandas as pd
      2 series = pd.Series(1, index = pd.date_range('2014-01-01', '2014-04-01', freq = 'D'))
----> 3 series.rolling('7D', min_periods=1, center=True, closed='left')

~\Anaconda3\lib\site-packages\pandas\core\generic.py in rolling(self, window, min_periods, freq, center, win_type, on, axis, closed)
   6193                                    min_periods=min_periods, freq=freq,
   6194                                    center=center, win_type=win_type,
-> 6195                                    on=on, axis=axis, closed=closed)
   6196 
   6197         cls.rolling = rolling

~\Anaconda3\lib\site-packages\pandas\core\window.py in rolling(obj, win_type, **kwds)
   2050         return Window(obj, win_type=win_type, **kwds)
   2051 
-> 2052     return Rolling(obj, **kwds)
   2053 
   2054 

~\Anaconda3\lib\site-packages\pandas\core\window.py in __init__(self, obj, window, min_periods, freq, center, win_type, axis, on, closed, **kwargs)
     84         self.win_freq = None
     85         self.axis = obj._get_axis_number(axis) if axis is not None else None
---> 86         self.validate()
     87 
     88     @property

~\Anaconda3\lib\site-packages\pandas\core\window.py in validate(self)
   1090             # we don't allow center
   1091             if self.center:
-> 1092                 raise NotImplementedError("center is not implemented "
   1093                                           "for datetimelike and offset "
   1094                                           "based windows")

NotImplementedError: center is not implemented for datetimelike and offset based windows

Expected output is the one generated by:

import pandas as pd
series = pd.Series(1, index = pd.date_range('2014-01-01', '2014-04-01', freq = 'D'))
series.rolling(7, min_periods=1, center=True).sum().head(10)

2014-01-01    4.0
2014-01-02    5.0
2014-01-03    6.0
2014-01-04    7.0
2014-01-05    7.0
2014-01-06    7.0
2014-01-07    7.0
2014-01-08    7.0
2014-01-09    7.0
2014-01-10    7.0
Freq: D, dtype: float64

But using datetime like offsets, since it simplifies part of my other code (not posted here).

Is there any alternative solution?

Thanks

cs95
  • 379,657
  • 97
  • 704
  • 746
karen
  • 822
  • 1
  • 6
  • 22

3 Answers3

11

Try the following (tested with pandas==0.23.3):

series.rolling('7D', min_periods=1, closed='left').sum().shift(-84, freq='h')

This will center your rolling sum in the 7-day window (by shifting -3.5 days), and will allow you to use a 'datetimelike' string for defining the window size. Note that shift() only takes an integer, thus defining with hours.

This will produce your desired output:

series.rolling('7D', min_periods=1, closed='left').sum().shift(-84, freq='h')['2014-01-01':].head(10)

2014-01-01 12:00:00    4.0
2014-01-02 12:00:00    5.0
2014-01-03 12:00:00    6.0
2014-01-04 12:00:00    7.0
2014-01-05 12:00:00    7.0
2014-01-06 12:00:00    7.0
2014-01-07 12:00:00    7.0
2014-01-08 12:00:00    7.0
2014-01-09 12:00:00    7.0
2014-01-10 12:00:00    7.0
Freq: D, dtype: float64

Note that the rolling sum is assigned to the center of the 7-day windows (using midnight to midnight timestamps), so the centered timestamp includes '12:00:00'.

Another option (as you show at the end of your question) is to resample the data to make sure it has even Datetime frequency, then use an integer for window size (window = 7) and center=True. However, you state that other parts of your code benefit from defining window with a 'datetimelike' string, so perhaps this option is not ideal.

pjw
  • 2,133
  • 3
  • 27
  • 44
  • Sorry, but I'm just not sure if this works. The indices of the input and output should be identical, which is not true in my test case – Brad Solomon Aug 02 '18 at 20:17
  • 1
    To be specific: I think it will fail if the data's frequency intervals are not consistent, i.e. minute-periodicity but skips random minutes. In that case the option is to resample, but that potentially blows up space complexity – Brad Solomon Aug 03 '18 at 11:15
  • The answer does provide the desired output, and the indices match if HH:MM:SS is dropped from the timestamp. This is just a higher precision timestamp for the centered window. However, inconsistent data frequency may be a problem with `shift`..... I haven't tested this. Resampling for even frequency would resolve this, but will insert 'NaN' for missing data. If there is a 'NaN' in the window, result of `sum()` will be 'NaN'. However, you can use `min_periods` to still return a sum if there are only occasional 'NaN' periods in the window. – pjw Aug 06 '18 at 17:14
4

From pandas version 1.3 this is * directly possible with pandas.

* Or will be (the work is merged, but 1.3 is not yet released as of today; I tested the lines below against the pandas main branch).

import pandas as pd
series = pd.Series(1, index = pd.date_range('2014-01-01', '2014-04-01', freq = 'D'))
series.rolling(7, min_periods=1, center=True).sum().head(10)

Output is as expected:

2014-01-01    4.0
2014-01-02    5.0
2014-01-03    6.0
2014-01-04    7.0
2014-01-05    7.0
2014-01-06    7.0
2014-01-07    7.0
2014-01-08    7.0
2014-01-09    7.0
2014-01-10    7.0
Freq: D, dtype: float64
Felix
  • 331
  • 2
  • 8
1

You could try to resample your serie/dataframe in order to convert the offset window to a fixed width window.

# Parameters 
window_timedelta = '7D'
resample_timedelta = '1D' 

# Convert offset to window size
window_size = pd.Timedelta(structure_duration) // pd.Timedelta(resample_timedelta)

# Resample serie
series_res = series.resample(resample_timedelta, on='datetime').first() 

# Perform the sum on the resampled serie
series_res['window_sum'] = series_res.rolling(window_size, center=True, min_periods=1).sum()

Note: the first hack in the resampling only works if you know that you have at maximum 1 pt/day. If you have more, you can replace it by sum or whatever is relevant to your data.

Note 2: the introduced NaN for missing dates will not cause the sum value to be NaN, Pandas ignores them while summing

Nicolas
  • 193
  • 1
  • 10