I am relatively new to pandas and am having trouble creating a new column based on a summed groupby.
Here is a snippet of my dataset:
In [1478]: mkt_vals_joined[['GameId', 'Year', 'HomeTeam', 'attMktValH']].head(10)
Out[1478]:
GameId Year HomeTeam attMktValH
0 1 2005 West Ham 18.50
1 2 2005 Aston Villa 31.85
2 3 2005 Everton 31.38
3 4 2005 Fulham 6.45
4 5 2005 Man City 30.80
5 6 2005 Middlesbrough 43.20
6 7 2005 Portsmouth 30.70
7 8 2005 Sunderland 5.80
8 9 2005 Arsenal 88.75
9 10 2005 Wigan 9.80
It has data up until 2018. The attMktValH column is the value of a team's offense in a particular year. So for example this code shows the value of Arsenal's offense each year:
In [1483]: mkt_vals_joined.groupby(['HomeTeam', 'Year'])['attMktValH'].first()
Out[1483]:
HomeTeam Year
Arsenal 2005 88.75
2006 77.25
2007 42.45
2008 92.50
2009 102.50
2010 110.30
2011 149.50
2012 85.50
2013 76.90
2014 129.65
2015 125.00
2016 143.50
2017 238.00
2018 176.70
My problem is that I need to create a column for each game in my original dataframe which is the % of the league's summed offensive value for each year.
For example, Arsenal had a value of 88.75 in 2005, and the whole league had a value of around 820, so for each Arsenal game in 2005, their value would be 88.75 / 820 * 100
If I simply sum a groupby the Year and Team it will sum each individual game and give me an incorrect result.
The code I am currently using to do this is as follows:
home_mkt_vals['attMkt%'] = home_mkt_vals['attMktValH'] / home_mkt_vals.groupby(['Year'])['attMktValH'].transform(lambda x: np.mean(x) * 20) * 100
However this seems extremely ugly to me and only works because there are 20 teams in each season.
Thanks for any help in advance.