I am trying to use groupby
, nlargest
, and sum
functions in Pandas together, but having trouble making it work.
State County Population
Alabama a 100
Alabama b 50
Alabama c 40
Alabama d 5
Alabama e 1
...
Wyoming a.51 180
Wyoming b.51 150
Wyoming c.51 56
Wyoming d.51 5
I want to use groupby
to select by state, then get the top 2 counties by population. Then use only those top 2 county population numbers to get a sum for that state.
In the end, I'll have a list that will have the state and the population (of it's top 2 counties).
I can get the groupby
and nlargest
to work, but getting the sum of the nlargest(2)
is a challenge.
The line I have right now is simply: df.groupby('State')['Population'].nlargest(2)