0
test= pd.DataFrame({'Year':['2016','2016','2016','2017','2017','2017']
                    ,'Country' : ['NL','GR','AU','NL','GB','BR']
                    ,'Count' : ['100','200','5','1000','2000','3']}) 
df;

    Year    Country Count
0   2016    NL      100
1   2016    GR      200
2   2016    AU      5
3   2017    NL      1000
4   2017    GB      2000
5   2017    BR      3

I would like to select the top 2 entries based on Column Count and Year.

such as

    Year  Country   Count
0   2016    NL      100
1   2016    GR      200
3   2017    NL      1000
4   2017    GB      2000

So for each year I want to see the top 2 countries. Any idea?

A.Papa
  • 486
  • 2
  • 8
  • 20
  • 2
    Possible duplicate of [Pandas get topmost n records within each group](https://stackoverflow.com/questions/20069009/pandas-get-topmost-n-records-within-each-group) – Teoretic Sep 05 '18 at 07:44

1 Answers1

0

First is necessary convert column Count to integers, then sort_values and last use GroupBy.tail or GroupBy.head:

test['Count'] = test['Count'].astype(int)
df = test.sort_values(['Year','Count']).groupby('Year').tail(2)
print (df)
   Year Country  Count
0  2016      NL    100
1  2016      GR    200
3  2017      NL   1000
4  2017      GB   2000

If want change order in count column:

test['Count'] = test['Count'].astype(int)
df = test.sort_values(['Year','Count'], ascending=[True, False]).groupby('Year').head(2)
print (df)
   Year Country  Count
1  2016      GR    200
0  2016      NL    100
4  2017      GB   2000
3  2017      NL   1000
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252