I'm working with the recent_grads practice data from fivethirtyeight, and I'm trying to see which major ('Major') has the largest number of students ('Total') in each major category ('Major_category').
Here's an example dataframe:
Major Major Category Total
Petroleum Eng Engineering 1001
Nuclear Eng Engineering 4350
Marketing Business 10035
Accounting Business 3051
I would like to have output like the following:
Major Major Category Total
Nuclear Eng Engineering 4350
Marketing Business 10035
...where only the Majors that have the largest Total in each Major Category are returned.
I've used a groupby statement that returns the largest number of students in each major category like so:
recent_grads.groupby('Major_category')['Total'].agg('max')
As expected, this returns the largest student count in each population. What I can't figure out is where to insert the 'Major' variable in the above code so my output not only tells me what the largest student count in each major category is, but what major it belongs to as well. My code throws an error no matter where I try putting 'Major,' but it feels like I'm missing something obvious.