3

I am trying to calculate rolling average of goals scored by each team using Pandas in Python. My data looks like this:

 league_round      home_team_name          away_team_name  home_team_goals  \
0          MLS          Toronto FC           Columbus Crew              0.0   
1          MLS      Houston Dynamo          Atlanta United              4.0   
2          MLS  Philadelphia Union  New England Revolution              2.0   
3          MLS        Orlando City               DC United              1.0   
4          MLS           FC Dallas          Real Salt Lake              1.0   

   away_team_goals  
0              2.0  
1              0.0  
2              0.0  
3              1.0  
4              1.0  

I tried following:

df_rolling = df.groupby('home_team_name')['home_team_goals'].rolling(window=3, min_periods=1).mean()
Output:

home_team_name         
Atlanta United       17     3.000000
                     24     3.500000
                     46     4.000000
                     64     3.666667
                     77     3.666667
                     100    2.000000
                     124    1.666667
                     147    1.333333
                     177    1.666667
                     182    2.666667
                     218    2.000000
                     221    2.666667
                     247    2.000000
                     280    2.666667
                     330    2.333333
                     352    2.333333
                     374    2.000000
                     402    2.333333
                     404    2.666667
                     408    2.666667
Chicago Fire         14     3.000000
                     38     2.500000

Above code returns moving average for only specific home team when it plays at home. My question is how can lookup the team when it plays away and include it in my rolling average? and how can I paste the results to my original df? Thanks a lot

Farhad M.
  • 161
  • 2
  • 11
  • Please include your input data as well, we cannot copy your picture – Erfan Apr 13 '19 at 20:32
  • sorry for my inexperience with stack overflow, couldn't paste table efficiently – Farhad M. Apr 13 '19 at 20:42
  • If you have your data in your IDE, for example `Jupyter Notebook`, just `print(df)` and copy - paste the output here. Or else find more information [here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Erfan Apr 13 '19 at 20:43
  • done, hope it helps. Thanks – Farhad M. Apr 13 '19 at 20:50
  • Can you do `print(df.drop('league_round', axis=1))` and print that? This way the columns dataframe doesnt get sliced at the last columns – Erfan Apr 13 '19 at 20:53
  • sorry still getting sliced, do you know another way? – Farhad M. Apr 13 '19 at 21:14

1 Answers1

0

I would tackle this problem as followed:

  1. seperate each dataframe based on columns about home games and away games.

  2. replace home_ and away_ in the column names so when you concat the two dataframes in the next step, they wont be seperated in different columns.

  3. Now you have all the data harmonized and we can calculate the rolling average


df1 = df.filter(regex='league|home')
df2 = df.filter(regex='league|away')

df1.columns = df1.columns.str.replace('home_', '')
df2.columns = df2.columns.str.replace('away_', '')

df = pd.concat([df1, df2], ignore_index=True)

print(df)

Which yields:

  league_round               team_name  team_goals
0          MLS              Toronto FC         0.0
1          MLS          Houston Dynamo         4.0
2          MLS      Philadelphia Union         2.0
3          MLS            Orlando City         1.0
4          MLS               FC Dallas         1.0
5          MLS           Columbus Crew         2.0
6          MLS          Atlanta United         0.0
7          MLS  New England Revolution         0.0
8          MLS               DC United         1.0
9          MLS          Real Salt Lake         1.0

Then use rolling average:

df_rolling = df.groupby('team_name')['team_goals'].rolling(window=3, min_periods=1).mean()
print(df_rolling)

Final output:

team_name                
Atlanta United          6    0.0
Columbus Crew           5    2.0
DC United               8    1.0
FC Dallas               4    1.0
Houston Dynamo          1    4.0
New England Revolution  7    0.0
Orlando City            3    1.0
Philadelphia Union      2    2.0
Real Salt Lake          9    1.0
Toronto FC              0    0.0
Name: team_goals, dtype: float64

Downside of this method is that you cannot track which were the home games and which were the away games, but I dont think thats important for calculating the rolling average.

Erfan
  • 40,971
  • 8
  • 66
  • 78
  • thanks, but this partially solves my issue. Above method creates different `df`, but i want to paste the results to the original `df` while preserving same structure, still distinguishing between home and away team. Such that `home_team` MA goals for past 5 games vs `away_team` in one row. I am still stuck at this point. Any advice? Thanks – Farhad M. Apr 14 '19 at 09:41