0

I'm trying to calculate the rolling average of a column of datetime objects. In my scenario, the input data are the last day below freezing each year for ~100 years.

Ignoring the years, I want to take a rolling average over the month-day rows. The output would be the average last day below freezing over some window of years.

My final plot will be years on the x-axis and the rolling average of the last day below freezing on the y-axis.

Here is a minimal setup of rows I want to average over.

df = pd.Series(['1900-05-04','1901-05-03', ..., '1910-04-30'])
df = pd.to_datetime(df)

>>> df

87   1900-05-04
88   1901-05-03
89   1902-05-06
90   1903-05-01
91   1904-05-03
92   1905-04-29
93   1906-05-03
94   1907-05-03
95   1908-05-04
96   1909-04-30
97   1910-04-30
dtype: datetime64[ns]

Desired output (with or without the years) with a window size of two would be:

1   1901-05-04
2   1902-05-05
3   1903-05-04
4   1904-05-02
5   1905-05-01
6   1906-05-03
7   1907-05-04
8   1908-05-02
9   1909-04-30
10  1910-04-30
ljusten
  • 177
  • 2
  • 10

1 Answers1

1

Here is a hack! It should be taken as inspiration and mostly like there exist a better one as it’s not optimal

pd.to_datetime(pd.DataFrame((df.loc[1:].dt.year, df.loc[1:].dt.month,
df.groupby(df.dt.month).transform(lambda d: d.dt.day.rolling(2).mean().round(0)).loc[1:].astype(int))).T.rename(columns={0:'year',1:'month',2:'day'}))

I group by month and took a rolling mean. I rebuilt the date by assembling the year, month and rolled day.

Results: enter image description here

Prayson W. Daniel
  • 14,191
  • 4
  • 51
  • 57
  • Thanks for taking the time. There remains however an issue. When I apply this to my larger dataset, I'm getting an `IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer` from this part: `ld_data.groupby(ld_data.dt.month).transform(lambda d: d.dt.day.rolling(2).mean().round(0)).loc[1:]` After checking, there are indeed some nan values amidst the data. The reason I suspect that this is happening is that the groupby month method ignores the index and there are some data that waver around the month boundary sometimes above and below. Any ideas for a fix? – ljusten Apr 06 '22 at 19:49
  • For example, try running the algorithm on the sample data I've updated the question with. – ljusten Apr 06 '22 at 19:51