How can I create a macro to loop through all columns in Pandas Dataframe?

Question

I'm analyzing a data set with information from soccer players. I have the soccer player name, his club and all skills. I want to get the mean value of all players from a club and check the skill in which the club is better. For instance, what club has the faster players, the tallest players, etc.

This is what my data looks like:

import pandas as pd

df = pd.DataFrame(
    {
        "Club": ["Palmeiras", "SPFC", "Corinthians", "Palmeiras", "SPFC"],
        "Player": ["FFFFF", "EEEE", "DDDD", "CCCC", "BBBB"],
        "Balance": [70, 80, 90, 50, 60],
        "Speed": [90, 89, 70, 88, 80],
        "Aggression": [70, 74, 80, 85, 66],
    }
)

In this example, I get the club with the highest average speed:

print("Club with highest speed: " + df.groupby("Club")["Speed"].mean().reset_index().sort_values("Speed", ascending=False).iloc[0, 0])

I want to print the same thing for all skills, the club with highest speed, the club with the highest balance and so on. I thought I could use something similar to what I did other times with df.iterrows(), however, I'm having a hard time combining this with the groupby function.

I've also found this example How to loop over grouped Pandas dataframe?, but it didn't work for me.

The problem is that I will have to write all variables and it's a big dataset. The example I gave was a small one, but the intention is to replicate in a much bigger... — dekio, Jun 19 '19 at 18:26

Quang Hoang · Accepted Answer · 2019-06-19T18:39:25.647

3

You want a combination of groupby().mean() to get all the mean stats by clubs and idxmax() to identify the clubs with maximum mean:

df.groupby('Club').mean().idxmax()

Output:

Balance       Corinthians
Speed           Palmeiras
Aggression    Corinthians
dtype: object

edited Jun 19 '19 at 18:39

answered Jun 19 '19 at 18:27

Quang Hoang

146,074
10
56
74

score 0 · Answer 2 · answered Jun 19 '19 at 18:47

I think @Quang Hoang already solved your problem, but if you want to get everything in a single dataframe, you could do as below.

means = df.groupby('Club').mean().max()
best = df.groupby('Club').mean().idxmax()
res = pd.DataFrame([means, best], index=['Mean', 'Team']).T

In [1]: print(res)
Out[1]: 
            Mean    Team
Balance     90      Corinthians
Speed       89      Palmeiras
Aggression  80      Corinthians

How can I create a macro to loop through all columns in Pandas Dataframe?

2 Answers2