From a Pandas Dataframe, return specific column values based on grouping and largest values of other columns

Question

Given the following code:

# Import pandas library 
import pandas as pd 


# Data to lists. 
 data = [{'Student': 'Eric', 'Grade': 96, 'Class':'A'}, \
{'Student': 'Caden', 'Grade': 92, 'Class':'A'}, \
{'Student': 'Sam', 'Grade': 90, 'Class':'A'}, \
{'Student': 'Leon', 'Grade': 88, 'Class':'A'}, \
{'Student': 'Laura', 'Grade': 80, 'Class':'B'}, \
{'Student': 'Leann', 'Grade': 22, 'Class':'B'}, \
{'Student': 'Glen', 'Grade': 9, 'Class':'C'}, \
{'Student': 'Jack', 'Grade': 90, 'Class':'C'}, \
{'Student': 'Jill', 'Grade': 87, 'Class':'C'}, \
{'Student': 'Joe', 'Grade': 58, 'Class':'C'}, \
{'Student': 'Andrew', 'Grade': 48, 'Class':'D'}, \
{'Student': 'Travis', 'Grade': 39, 'Class':'E'}, \
{'Student': 'Henry', 'Grade': 23, 'Class':'E'}, \
{'Student': 'Chris', 'Grade': 19, 'Class':'E'}, \
{'Student': 'Jim', 'Grade': 1, 'Class':'E'}, \
{'Student': 'Sarah', 'Grade': 93, 'Class':'E'}, \
{'Student': 'Brit', 'Grade': 92, 'Class':'E'}, \
] 

# Creates DataFrame. 
 df = pd.DataFrame(data) 

 print(df.groupby('Class')['Grade'].nlargest(2))

From the dataframe, I would like to return the students' names with the top 2 grades out of each class. I would like to return two different versions of the results.

Version 1 would have all of the original columns:

And, Version 2 would only return the names:

Output (would prefer to have the aforementioned two versions):

post data as text, not images: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Paul H, Apr 19 '19 at 16:41
you grouped by the student, so you're grabbing each student's two best grades — Paul H, Apr 19 '19 at 16:43
Working on the text.. Each student only has one grade. I am grabbing the two best grades from each class. — enter_display_name_here, Apr 19 '19 at 16:46

Chris Adams · Accepted Answer · 2019-04-19T17:08:25.860

2

IIUC, you can sort_values, then apply head to your groupby object

df_new = df.sort_values(['Class', 'Grade'], ascending=[True, False]).groupby('Class').head(2)

[out]

  Class  Grade Student
0      A     96    Eric
1      A     92   Caden
4      B     80   Laura
5      B     22   Leann
7      C     90    Jack
8      C     87    Jill
10     D     48  Andrew
15     E     93   Sarah
16     E     92    Brit

If you need to filter for your version 2 output, just use:

df_new[['Student']]

   Student
0     Eric
1    Caden
4    Laura
5    Leann
7     Jack
8     Jill
10  Andrew
15   Sarah
16    Brit

edited Apr 19 '19 at 17:08

answered Apr 19 '19 at 16:57

Chris Adams

18,389
4
22
39

Nice! How can I just return the Student column? Trying to insert "df[['Student']]" returns a class error. But, I am assuming that inserting "df[['Student']]" strips out all of the other columns before the trailing commands are executed. – enter_display_name_here Apr 19 '19 at 17:02
1

append `[['Student']]` onto the end of `... head(2)` – Chris Adams Apr 19 '19 at 17:04
Oh geez, I didn't even see that part of the post "df_new[['Student']]". Thanks! – enter_display_name_here Apr 19 '19 at 17:06
1

@enter_display_name_here i only added it after you asked :) – Chris Adams Apr 19 '19 at 17:07

score 1 · Answer 2 · answered Apr 19 '19 at 17:05

Another option replicating your process is:

df.loc[df.groupby('Class')['Grade'].nlargest(2).index.get_level_values(1)]

   Class  Grade Student
0      A     96    Eric
1      A     92   Caden
4      B     80   Laura
5      B     22   Leann
7      C     90    Jack
8      C     87    Jill
10     D     48  Andrew
15     E     93   Sarah
16     E     92    Brit

From a Pandas Dataframe, return specific column values based on grouping and largest values of other columns

2 Answers2