1

I'm analyzing a data set with information from soccer players. I have the soccer player name, his club and all skills. I want to get the mean value of all players from a club and check the skill in which the club is better. For instance, what club has the faster players, the tallest players, etc.

This is what my data looks like:

import pandas as pd

df = pd.DataFrame(
    {
        "Club": ["Palmeiras", "SPFC", "Corinthians", "Palmeiras", "SPFC"],
        "Player": ["FFFFF", "EEEE", "DDDD", "CCCC", "BBBB"],
        "Balance": [70, 80, 90, 50, 60],
        "Speed": [90, 89, 70, 88, 80],
        "Aggression": [70, 74, 80, 85, 66],
    }
)

In this example, I get the club with the highest average speed:

print("Club with highest speed: " + df.groupby("Club")["Speed"].mean().reset_index().sort_values("Speed", ascending=False).iloc[0, 0])

I want to print the same thing for all skills, the club with highest speed, the club with the highest balance and so on. I thought I could use something similar to what I did other times with df.iterrows(), however, I'm having a hard time combining this with the groupby function.

I've also found this example How to loop over grouped Pandas dataframe?, but it didn't work for me.

grooveplex
  • 2,492
  • 4
  • 28
  • 30
dekio
  • 810
  • 3
  • 16
  • 33
  • The problem is that I will have to write all variables and it's a big dataset. The example I gave was a small one, but the intention is to replicate in a much bigger... – dekio Jun 19 '19 at 18:26

2 Answers2

3

You want a combination of groupby().mean() to get all the mean stats by clubs and idxmax() to identify the clubs with maximum mean:

df.groupby('Club').mean().idxmax()

Output:

Balance       Corinthians
Speed           Palmeiras
Aggression    Corinthians
dtype: object
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0

I think @Quang Hoang already solved your problem, but if you want to get everything in a single dataframe, you could do as below.

means = df.groupby('Club').mean().max()
best = df.groupby('Club').mean().idxmax()
res = pd.DataFrame([means, best], index=['Mean', 'Team']).T

In [1]: print(res)
Out[1]: 
            Mean    Team
Balance     90      Corinthians
Speed       89      Palmeiras
Aggression  80      Corinthians
AlCorreia
  • 532
  • 4
  • 12