1

I'm new to python and pandas, and trying to "learn by doing."

I'm currently working with two football/soccer (depending on where you're from!) dataframes:

  1. player_table has several columns, among others 'player_name' and 'player_id'
     player_id player_name        
  0  223       Lionel Messi        
  1  157       Cristiano Ronaldo   
  2  962       Neymar              
  1. match_table also has several columns, among others 'home_player_1', '..._2', '..._3' and so on, as well as the corresponding 'away_player_1', '...2' , '..._3' and so on. The content of these columns is a player_id, such that you can tell which 22 (2x11) players participated in a given match through their respective unique IDs.

I'll just post a 2 vs. 2 example here, because that works just as well:

     match_id  home_player_1  home_player_2  away_player_1  away_player_2
  0  321       223            852            729            853
  1  322       223            858            157            159
  2  323       680            742            223            412

What I would like to do now is to add a new column to player_table which gives the number of appearances - player_table['appearances'] by counting the number of times each player_id is mentioned in the part of the dataframe match_table bound horizontally by (home player 1, away player 2) and vertically by (first match, last match)

Desired result:

     player_id player_name         appearances
  0  223       Lionel Messi        3
  1  157       Cristiano Ronaldo   1
  2  962       Neymar              0

Coming from other programming languages I think my standard solution would be a nested for loop, but I understand that is frowned upon in python...

I have tried several solutions but none really work, this seems to at least give the number of appearances as "home_player_1"

player_table['appearances'] = player_table['player_id'].map(match_table['home_player_1'].value_counts()) 

Is there a way to expand the map function to include several columns in a dataframe? Or do I have to stack the 22 columns on top of one another in a new dataframe, and then map? Or is map not the appropriate function?

Would really appreciate your support, thanks! Philipp

Edit: added specific input and desired output as requested

  • What does the table looks like and what is the expected output? Here is a good reference to start: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – garagnoth Oct 15 '21 at 00:27

1 Answers1

0

What you could do is use .melt() on the match_table player columns (so it'll turn your wide table in to a tall/long table of a single column). Then do a .value_counts on the that one column. Finally join it to the player_table on the 'player_id' column

import pandas as pd

player_table = pd.DataFrame({'player_id':[223,157,962],
                             'player_name':['Lionel Messi','Cristiano Ronaldo','Neymar']})

match_table = pd.DataFrame({
        'match_id':[321,322,323],
        'home_player_1':[223,223,680],
        'home_player_2':[852,858,742],
        'away_player_1':[729,157,223],
        'away_player_2':[853,159,412]})


player_cols = [x for x in match_table.columns if 'player_' in x]
match_table[player_cols].value_counts(sort=True)

df1 = match_table[player_cols].melt(var_name='columns', value_name='appearances')['appearances'].value_counts(sort=True).reset_index(drop=False).rename(columns={'index':'player_id'})
appearances_df = df1.merge(player_table, how='right', on='player_id')[['player_id','player_name','appearances']].fillna(0)

Output:

print(appearances_df)
   player_id        player_name  appearances
0        223       Lionel Messi          3.0
1        157  Cristiano Ronaldo          1.0
2        962             Neymar          0.0
chitown88
  • 27,527
  • 4
  • 30
  • 59