0

I have pandas DataFrame read from csv.

 import pandas as pd
 df = pd.read_csv("record.csv")

 df # data contain in csv 

 provinces  month  total number 
 BC         1      20
 Ontrio     1      10
 Ontrio     2      3
 BC         2      15
 Yukon      2      33
 Nunavut    3      17
 BC         3      6
 Yukon      3      3

What i want to get is the top 2 provinces of each month with the highest total number:

 provinces  month  total number 
 BC         1      20
 Ontrio     1      10
 BC         2      15
 Yukon      2      33
 Nunavut    3      17
 BC         3      6

I can done by using pandas query but how can i design a function to achieve the result?

Fufu Alex
  • 37
  • 1
  • 1
  • 9

1 Answers1

-1
df =pd.DataFrame({
    'prov':['BC','Ont','Ont','BC','Yu','Nun','BC','Yu'],
    'month':[1,1,2,2,2,3,3,3],
    'total_no':[20,10,3,15,33,17,6,3]
})


df=df.sort_values(by=['month','total_no'],ascending=False)

df=df.groupby('month').head(2)
print(df)


    prov  month  total_no
5  Nun      3        17
6   BC      3         6
4   Yu      2        33
3   BC      2        15
0   BC      1        20
1  Ont      1        10
sygneto
  • 1,761
  • 1
  • 13
  • 26