I am looking at annualized baseball statistics and would like to calculate a rolling mean looking back at the previous 3 years' worth of performance in regard to number of Hits. However, I want to account for the fact that while my dataset reaches back more than 3 years, one single player may have only been in the league for 1-2 years and will not have 3 years' worth of observations off of which I can calculate the rolling mean. For example:
In[6]: df = pd.DataFrame({'PLAYER_ID': ['A', 'A', 'A', 'B', 'B'],
'HITS': [45, 55, 50, 20, 24]})
In[9]: df
Out[9]:
PLAYER_ID HITS
0 A 45
1 A 55
2 A 50
3 B 20
4 B 24
How would I use a groupby and aggregation/transform (or some other process) to calculate the rolling mean for each player with a max 3 years historic totals and then just use the max available historic observations for a player with less than 3 years' historic performance data available?
Pretty sure my answer lies within the Pandas package but would be interested in any solution.
Thanks!