0

Using the EPL Kaggle dataset, I added a Winner column in the end to display names of the match winner using:

epl['Winner'] = epl.apply(lambda x: x['HomeTeam'] if x['FTR'] == 'H' else (x['AwayTeam'] if x['FTR'] == 'A' else 'Draw'), axis=1) epl

I have another dataframe 'comp' that sums up values of multiple columns using:

comp = (pd.concat([epl.rename(columns={'HomeTeam':'Team','HY':'YellowCards','HF':'Fouls'}), epl.rename(columns={'AwayTeam':'Team','AY':'YellowCards','AF':'Fouls'})]).groupby(['Season','Team'])[['YellowCards','Fouls']].sum().astype(int).reset_index())

comp

I want to find a way to add the value count of each team's wins every season to the comp dataframe. So it should show:

Season     Team      Shots   On Target   YellowCards     RedCards    Fouls    Wins
2000-01   Arsenal     540      295           17            32         495      20
2000-01  Aston Villa  349      165           24            43         491      13
2000-01   Bradford    371      174           27            30         517       5
2000-01   Charlton    373      217           24            22         467      14

etc. all the way to 2021-22

EDIT: To set up the dataframe:

df = pd.read_csv('results.csv')

epl=df.dropna().reset_index(drop=True)

Suj
  • 37
  • 5
  • 1
    Please provide [minimal reproducible examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) of the input (or the full code to set up the dataframe from the Kaggle link) and the matching expected output – mozway May 12 '23 at 13:49

1 Answers1

1

Try the following:

epl_grouped = epl.groupby('Winner').count().reset_index()[['Winner', 'FTHG']] # Creating a grouped dataframe with only the Winner column and one random column which holds the 'count' value
comp = comp.merge(epl_grouped, left_on='Team', right_on='Winner', how='left') # Merging both dataframes
comp = comp.rename(columns={'FTHG': 'Wins'}) # renaming the random column I chose to 'Wins'
comp

I assumed that the top dataframe is called 'epl' and the bottom one 'comp'

UPDATE:

If you want to do it by team and session you need the groupby to be by those two columns:

epl_grouped = epl.groupby(['Winner','Session']).count().reset_index()[['Winner','Session', 'FTHG']]

and the merge should be by both columns:

comp = comp.merge(epl_grouped, left_on=['Team', 'Session'], right_on=['Winner', 'Session'], how='left')
gtomer
  • 5,643
  • 1
  • 10
  • 21
  • so for the 'Wins' column, it's showing me a total sum of all Arsenal wins (470) for EVERY season where as I wanted it to show '20 Wins for 2000-01' or '26 Wins for 2001-02' etc. What would be the solution here? – Suj May 12 '23 at 14:44
  • Where can I find in the EPL dataframe the season? – gtomer May 12 '23 at 15:11
  • It's available in the original dataset from kaggle, linked given above. the screenshot provided does not show because the there were too many columns to show up on screen. – Suj May 12 '23 at 15:29
  • Please don;t send me there. Let me know what is the column name – gtomer May 12 '23 at 19:08
  • The column name is Season. It has values: 2000-01, 2001-02 etc. As you can see in the above screenshot. – Suj May 13 '23 at 06:02
  • @Suj - see updated answer above – gtomer May 13 '23 at 10:36