0

I have a data frame as follows

REG LOC DATE        SUM
1   A1  19-07-20     10 
1   B1  19-07-20     25 
1   C1  19-07-20     20 
2   A2  19-07-20     25 
2   B2  19-07-20     30 
2   C3  19-07-20     45 
1   A1  20-07-20     15 
1   B1  20-07-20     20 
1   C1  20-07-20     30 
2   A2  20-07-20     10 
2   B2  20-07-20     15 
2   C3  20-07-20     30 
1   A1  21-07-20     25 
1   B1  21-07-20     35 
1   C1  21-07-20     45 
2   A2  21-07-20     20 
2   B2  21-07-20     30 
2   C3  21-07-20     40 

I want to find LOC with smallest 2 value of SUM for each region and date combination. For example for Date 19-7-20 and region 1, smallest is Loc A1 and C1 and for region 2 is A2 and B2. I am able to do it for one level with following code but not able to introduce another level in the code.

groupby(level=0,group_keys=False).apply(lambda x: x.nsmallest())

How can I do it for 2 levels not just one level up when I want n smallest values for a combination.

Thanks

arb
  • 35
  • 3
  • 1
    `df.loc[df.groupby(['REG','DATE'])['SUM'].idxmin(), 'LOC']` – Quang Hoang Jul 20 '20 at 12:46
  • 1
    Use `df = df.sort_values(['REG','SUM']).groupby(['REG','DATE']).head(2)` – jezrael Jul 20 '20 at 13:05
  • What if I need more than one smallest value i.e 5 smallest values. That's the reason I used n smallest. – arb Jul 20 '20 at 13:05
  • 1
    @jezrael Thanks. It worked. I was trying another approach mentioned in the suggested thread and was not able to get smallest n values with transform. Thanks again. – arb Jul 20 '20 at 13:13

0 Answers0