Trying to get the row of each player with the highest salary per age

Question

I've got a dataframe of NBA players and salaries and I'm trying to get a dataframe/query that returns the player with the highest salary for each year in the age column. It should return one row per year (19, 20, 21, etc), the player's name, and the player's salary.

sa.df[['Age','salary']].groupby(['Age']).idxmax(numeric_only=True)

It returns a dataframe with the proper ages in the age column, but the salary column has the player names instead of their salary. I set the index as the 'Player' column but it isn't a column in the df on it's own because when I try it I get an error that the player column is ambiguous.

Show your actual output and expected output with some sample data for us to try out and help you by [editing](https://stackoverflow.com/posts/76974221/edit) question. — shaik moeed, Aug 25 '23 at 04:18
And is your dataframe variable name is `sa_df` or `sadf` instead of `sa.df`? — shaik moeed, Aug 25 '23 at 04:20

score 0 · Answer 1 · answered Aug 25 '23 at 04:33

It's quite hard to know what's going on in your example when you don't provide a sample dataframe. Below is how I would do it:

import random, pandas as pd
random.seed(0)

# for generating names
first_names = ['Adam', 'Bob', 'Charlie', 'David', 'Evan', 'Frank', 'George', 'Harry', 'Ivan', 'John', 'Kevin', 'Larry', 'Michael', 'Nathan', 'Oscar', 'Peter', 'Quentin', 'Robert', 'Steven', 'Thomas', 'Ulysses', 'Victor', 'William', 'Xavier', 'Yuri', 'Zachary']
last_names = ['Adams', 'Baker', 'Carter', 'Davis', 'Edwards', 'Franklin', 'Garcia', 'Harris', 'Ivanov', 'Johnson', 'Kowalski', 'Lee', 'Miller', 'Nelson', 'Olsen', 'Parker', 'Quinn', 'Robinson', 'Smith', 'Taylor', 'Ulyanov', 'Volkov', 'Williams', 'Xavier', 'Young', 'Zimmerman']

# generate sample data
df = pd.DataFrame({
    'first_name': [random.choice(first_names) for _ in range(100)],
    'last_name': [random.choice(last_names) for _ in range(100)],
    'age': [random.randint(18, 35) for _ in range(100)],
    'salary': [random.randint(1000, 10000) for _ in range(100)]
})

# get the rows where the salary is the max salary for each age
df.loc[df.groupby('age')['salary'].idxmax()]

Output:

   first_name last_name  age  salary
84     Robert     Smith   18    8849
13     George    Parker   19    8792
61      Peter       Lee   20    9986
4        Ivan    Harris   21    9656
42     George  Williams   22    7153
79      Kevin     Adams   23    8883
63      Harry  Franklin   24    5807
52    Charlie  Kowalski   25    6541
50     Robert     Smith   26    7016
38     Nathan     Olsen   27    7102
87      David    Carter   28    7988
39      Kevin    Parker   29    9813
18       Yuri   Edwards   30    8421
70      Harry  Williams   31    9569
80    Quentin     Davis   32    8766
22       Ivan  Williams   33    9533
37      Larry    Ivanov   34    8134
93     Thomas    Garcia   35    7153

Trying to get the row of each player with the highest salary per age

1 Answers1