0

I'm pretty new to pandas/python and coding overall. Thus I got a question about coding sums of columns with pandas.

I have a 306x7 dataframe about past soccer results. Now I want to sum both the home goals and away goals for each club and put it into a new dataframe (18 rows for 18 clubs and 2 columns for homegoals and awaygoals fullseason).

Could anyone give me an idea on how to proceed?

teams = Liga2['HomeTeam'].unique()

df = pd.DataFrame(index=teams, columns=['FTHG','FTAG'])

for team in teams:
    df.loc[team, 'FTHG'] = [Liga2.HomeTeam == team].FTHG.sum()
    df.loc[team, 'FTAG'] = [Liga2.AwayTeam == team].FTHG.sum()

Error:


AttributeError                            Traceback (most recent call last)
<ipython-input-12-a1b735dbadf3> in <module>
      4 
      5 for team in teams:
----> 6     df.loc[team, 'FTHG'] = [Liga2.HomeTeam == team].FTHG.sum()
      7     df.loc[team, 'FTAG'] = [Liga2.AwayTeam == team].FTHG.sum()

AttributeError: 'list' object has no attribute 'FTHG'

This is the df:

https://i.stack.imgur.com/x7pLv.jpg

Thank you for your ideas.

  • 2
    please share a small sample dataframe and paste as text, no images please – anky Feb 17 '19 at 14:09
  • 1
    If you want to sum the two columns `FTHG` and `FTAG` you can try - `df['TOTAL_GOALS'] = df['FTHG'] + df['FTAG']`. If this is not what you want please post sample input and output. – Sumanth Feb 17 '19 at 14:22
  • Date HomeTeam AwayTeam FTHG FTAG FTR AS 0 2017-08-18 Bayern Munich Leverkusen 3 1 H 19 1 2017-08-19 Hamburg Augsburg 1 0 H 13 2 2017-08-19 Hertha Stuttgart 2 0 H 9 3 2017-08-19 Hoffenheim Werder Bremen 1 0 H 11 4 2017-08-19 Mainz Hannover 0 1 A 6 –  Feb 17 '19 at 15:15

1 Answers1

0

The easiest way to think through this (no groupby) is to just create a unique list of teams and a df with home and away goals, then to add the sum of home and away goals for each team.

# list of unique teams (assuming home and away teams are identical)
teams = liga2['HomeTeam'].unique()

# create the dataframe
df = pd.DataFrame(index=teams, columns=['home_goals','away_goals'])

# for each team, populate the df with the sum of their home and away goals
for team in teams:
    df.loc[team,'home_goals'] = liga2[ liga2.HomeTeam == team ].FTHG.sum()
    df.loc[team,'away_goals'] = liga2[ liga2.AwayTeam == team ].FTAG.sum()

With groupby, all you need is:

# create the groupby sums, where the team name is the index
home = liga2.groupby('HomeTeam').sum()['FTHG']
away = liga2.groupby('AwayTeam')['FTAG'].sum()

# concat them as columns in a df
df = pd.concat( [home, away],axis=1 )

russellthehippo
  • 402
  • 4
  • 10
  • Thank you for your reply. I added the new error message in the initial post above. Could you also tell me how to insert a df into a stackoverflow question layout without having the df format crashed? :-D –  Feb 17 '19 at 15:30
  • Your code is not similar to my answer in lines 6 and 7; yours is just a list, as the error says. It needs to be a slice of the liga2 dataframe like `liga2[liga2.HomeTeam == team]` – russellthehippo Feb 17 '19 at 15:40
  • "make it "copy and pasteable" using pd.read_clipboard(sep='\s\s+'), you can format the text for StackOverflow highlight and use Ctrl+K (or prepend four spaces to each line)" [from this question](https://stackoverflow.com/a/20159305/7372418) – russellthehippo Feb 17 '19 at 15:48
  • If you don't mind, please select my question as the correct answer if I helped. – russellthehippo Feb 17 '19 at 15:50