1

I am new to python and would like to find out the difference between two column of a dataframe. What I want is to find the difference between two column along with a respective third column. For example, I have a dataframe Soccer which contains the list of all the team playing soccer with the goals against and for their club. I wanted to find out the goal difference along with the team name. i.e. (Goals Diff=goalsFor-goalsAgainst).

 Pos             Team  Seasons Points GamesPlayed GamesWon GamesDrawn  \
0    1      Real Madrid       86   5656        2600     1647        552   
1    2        Barcelona       86   5435        2500     1581        573   
2    3  Atletico Madrid       80   5111        2614     1241        598   


GamesLost GoalsFor GoalsAgainst
0       563     5947         3140   
1       608     5900         3114     
2       775     4534         3309    

I tried creating a function and then iterating through each row of a dataframe as below:

for index, row in football.iterrows():
        ##pdb.set_trace()
        goalsFor=row['GoalsFor']
        goalsAgainst=row['GoalsAgainst']
        teamName=row['Team']
        if not total:
            totals=np.array(Goal_diff_count_Formal(int(goalsFor), int(goalsAgainst), teamName))
        else:
            total= total.append(Goal_diff_count_Formal(int(goalsFor), int(goalsAgainst), teamName))

    return total

def Goal_diff_count_Formal(gFor, gAgainst, team):
goalsDifference=gFor-gAgainst
return [team, goalsDifference]

However, I would like to know if there is a quickest way to get this, something like

dataframe['goalsFor'] - dataframe['goalsAgainst'] #along with the team name in the dataframe
Running Rabbit
  • 2,634
  • 15
  • 48
  • 69

1 Answers1

1

Solution if unique values in Team column - create index by Team, get difference and select Team by index:

df = df.set_index('Team')
s = df['GoalsFor'] - df['GoalsAgainst'] 
print (s)
Team
Real Madrid        2807
Barcelona          2786
Atletico Madrid    1225
dtype: int64

print (s['Atletico Madrid'])
1225

Solution if possible duplicated values in Team column:

I believe you need grouping by Team and aggregate sum first and then get difference:

#change sample data for Team in row 3
print (df)
   Pos         Team  Seasons  Points  GamesPlayed  GamesWon  GamesDrawn  \
0    1  Real Madrid       86    5656         2600      1647         552   
1    2    Barcelona       86    5435         2500      1581         573   
2    3  Real Madrid       80    5111         2614      1241         598   

   GamesLost  GoalsFor  GoalsAgainst  
0        563      5947          3140  
1        608      5900          3114  
2        775      4534          3309  


df = df.groupby('Team')['GoalsFor','GoalsAgainst'].sum()
df['diff'] = df['GoalsFor'] - df['GoalsAgainst'] 
print (df)
             GoalsFor  GoalsAgainst  diff
Team                                     
Barcelona        5900          3114  2786
Real Madrid     10481          6449  4032

EDIT:

s = df['GoalsFor'] - df['GoalsAgainst'] 
print (s)
Team
Barcelona      2786
Real Madrid    4032
dtype: int64

print (s['Barcelona'])
2786
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • @Yash - there are duplicated `Team` values like in my edited data? – jezrael Oct 14 '19 at 06:11
  • I am getting this error on finding the difference between two column TypeError: unsupported operand type(s) for -: 'str' and 'str' – Running Rabbit Oct 14 '19 at 06:40
  • @Yash - It means in columns `GoalsFor` and `GoalsAgainst` are not numeric values, so need `pd.to_numeric(df['GoalsFor'], errors='coerce') - pd.to_numeric(df['GoalsAgainst'], errors='coerce')`. more info is [here](https://stackoverflow.com/a/28648923) – jezrael Oct 14 '19 at 06:42