0

I am relatively new to pandas and am having trouble creating a new column based on a summed groupby.

Here is a snippet of my dataset:

In [1478]: mkt_vals_joined[['GameId', 'Year', 'HomeTeam', 'attMktValH']].head(10)
Out[1478]: 
   GameId  Year       HomeTeam  attMktValH
0       1  2005       West Ham       18.50
1       2  2005    Aston Villa       31.85
2       3  2005        Everton       31.38
3       4  2005         Fulham        6.45
4       5  2005       Man City       30.80
5       6  2005  Middlesbrough       43.20
6       7  2005     Portsmouth       30.70
7       8  2005     Sunderland        5.80
8       9  2005        Arsenal       88.75
9      10  2005          Wigan        9.80

It has data up until 2018. The attMktValH column is the value of a team's offense in a particular year. So for example this code shows the value of Arsenal's offense each year:

In [1483]: mkt_vals_joined.groupby(['HomeTeam', 'Year'])['attMktValH'].first()
Out[1483]: 
HomeTeam          Year
Arsenal           2005     88.75
                  2006     77.25
                  2007     42.45
                  2008     92.50
                  2009    102.50
                  2010    110.30
                  2011    149.50
                  2012     85.50
                  2013     76.90
                  2014    129.65
                  2015    125.00
                  2016    143.50
                  2017    238.00
                  2018    176.70

My problem is that I need to create a column for each game in my original dataframe which is the % of the league's summed offensive value for each year.

For example, Arsenal had a value of 88.75 in 2005, and the whole league had a value of around 820, so for each Arsenal game in 2005, their value would be 88.75 / 820 * 100

If I simply sum a groupby the Year and Team it will sum each individual game and give me an incorrect result.

The code I am currently using to do this is as follows:

home_mkt_vals['attMkt%'] = home_mkt_vals['attMktValH'] / home_mkt_vals.groupby(['Year'])['attMktValH'].transform(lambda x: np.mean(x) * 20) * 100

However this seems extremely ugly to me and only works because there are 20 teams in each season.

Thanks for any help in advance.

James Ward
  • 357
  • 2
  • 3
  • 11
  • Maybe this helps? https://stackoverflow.com/questions/23377108/pandas-percentage-of-total-with-groupby – Scratch'N'Purr Jul 25 '18 at 06:54
  • Thanks a lot, that helps. – James Ward Jul 25 '18 at 07:00
  • Still unable to solve my problem unfortunately. I now need to somehow join those back to the original dataframe. I could do it without pandas in a dictionary or something but I want to do it in pandas. Any idea? – James Ward Jul 25 '18 at 07:19
  • You can use the [`merge`](https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.DataFrame.merge.html) method. Just make sure to [`reset_index`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reset_index.html) your grouped by dataframe before joining it back to your original dataframe. – Scratch'N'Purr Jul 25 '18 at 08:24

0 Answers0