I have a dataframe where I take the subset of only numeric columns, calculate the 5 day rolling average for each numeric column and add it as a new column to the df.
This approach works but currently takes quite a long time (8 seconds per column). I'm wondering if there is a better way to do this.
A working toy example of what I'm doing currently:
data = {'Group': ['A','A','A','A','A','A','B','B','B','B','B','C','C','C','C','C'],
'Year' : ['2017', '2017', '2017', '2018', '2018', '2018', '2017', '2017', '2018', '2018', '2017', '2017', '2017', '2017', '2018', '2018'],
'Score 1' : [1,2,3,4,5,6,1,2,3,4,5,1,2,3,4,5],
'Score 2': [1,4,5,7,5,5,6,1,4,5,6,7,4,6,4,6]}
df = pd.DataFrame(data)
for col in ['Score 1', 'Score 2']:
df[col + '_avg'] = df.groupby(['Year', 'Group'])[col].apply(lambda x: x.rolling(2, 1).mean().shift().bfill())