How to sum the nlargest() integers in groupby

Question

I have a dataframe like this:

    Index STNAME COUNTY COUNTY_POP
      0     AL     0       100
      1     AL     1       150
      2     AL     3       200
      3     AL     5       50
    ...
     15     CA     0       300
     16     CA     1       200
     17     CA     3       250
     18     CA     4       350

I want to sum the three largest integers from COUNTY_POP for each state. So far, I have:

    In[]: df.groupby(['STNAME'])['COUNTY_POP'].nlargest(3)
    Out[]:
    Index STNAME COUNTY COUNTY_POP
      0     AL     0       100
      1     AL     1       150
      2     AL     3       200
    ...
     15     CA     0       300
     17     CA     3       250
     18     CA     4       350

However when I add the .sum() operation to the above code, I receive the following output.

    In[]: df.groupby(['STNAME'])['COUNTY_POP'].nlargest(3).sum()
    Out[]:
    1350

I'm relatively new to Python and Pandas. If anyone could explain what causes this and how to correct it, I'd really appreciate it!

score 7 · Accepted Answer · answered Nov 09 '16 at 22:56

7

Is that what you want?

In [25]: df.groupby('STNAME')['COUNTY_POP'].agg(lambda x: x.nlargest(3).sum())
Out[25]:
STNAME
AL    450
CA    900
Name: COUNTY_POP, dtype: int64

answered Nov 09 '16 at 22:56

MaxU - stand with Ukraine

205,989
36
386
419

This worked great, @MaxU! Thank you. Can you explain the use of .agg() and lambda? – R7L208 Nov 10 '16 at 15:25

score 2 · Answer 2 · answered Nov 09 '16 at 23:40

2

presort and slice... a tad faster

df.sort_values('COUNTY_POP').groupby('STNAME').COUNTY_POP \
    .apply(lambda x: x.values[-3:].sum())

STNAME
AL    450
CA    900
Name: COUNTY_POP, dtype: int64

answered Nov 09 '16 at 23:40

piRSquared

285,575
57
475
624

3

this is only faster for very small groups - this is the point of nlargest it doesn't need to sort – Jeff Nov 10 '16 at 00:18
@Jeff thank you for the clarification – piRSquared Nov 10 '16 at 00:23

How to sum the nlargest() integers in groupby

2 Answers2