I have a time series of 10 years of days, with five columns of data. Here is code to randomly generate:
import numpy as np
import pandas as pd
np.random.seed(1)
rows,cols = 4950,5
data = np.random.rand(rows,cols)
tidx = pd.date_range('2010-01-01', periods=rows, freq='D')
df = pd.DataFrame(data, columns=['a','b','c','d','e'], index=tidx)
print (df)
For each day in the time series I would like to calculate the mean for the same day in the PREVIOUS five years. For example for the date 1/1/2023: take the average of 1/1/2022, 1/1/2021, 1/1/2020, 1/1/2019 and 1/1/2018. I've tried many variations on:
df.groupby([df.index.month, df.index.day]).mean()
But I can't get it use just the five years prior to the index.year in the mean calculation for each row. I would like to add five columns with the "rolling" average, and name each column for example "A_Avg"