2

I am just getting to know pandas and I can't get over a conceptual problem. My dataframe is as follows:

df=pd.DataFrame({'ANIMAL':[1,1,1,1,1,2,2,2],
            'AGE_D' : [3,6,47,377,698,1,9,241],
            'AGE_Y' : [1,1,1,2,2,1,1,1]})

I would like to do a nested group within animal and age_y and then select the min on the subgroup. Desired output would be then:

ANIMAL  AGE_Y   AGE_D
1       1       3
1       2       377
2       1       1

I can do this without nesting within animal, e.g. if my df2 = subset for ANIMAL=1 then

df2.loc[df2.groupby('AGE_Y')['AGE_D'].idxmin()]

But all the things I tried with nesting the animal in the group by were unsuccesful. I am guessing that my order of the operations is wrong... How should I go about this?

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
branwen85
  • 1,606
  • 5
  • 20
  • 25
  • Dupe: http://stackoverflow.com/questions/23394476/keep-other-columns-when-using-min-with-groupby basically the only difference is you're wanting to group on multiple columns – EdChum Nov 08 '16 at 15:49
  • Having seen the solution I can see how this is similar, but unless you know it, it's not so easy to deduce. I was actually looking at the thread you posted, but was stumbling with the nested grouping. – branwen85 Nov 08 '16 at 16:10
  • Basically the thing to note here that you can group on multiple columns which looks like the step you were missing here – EdChum Nov 08 '16 at 16:48

1 Answers1

3

I think you need add columns to groupby - group by columns ANIMAL and AGE_Y:

df = df2.loc[df2.groupby(['ANIMAL','AGE_Y'])['AGE_D'].idxmin()]
df = df[['ANIMAL','AGE_Y','AGE_D']]
print (df)
   ANIMAL  AGE_Y  AGE_D
0       1      1      3
3       1      2    377
5       2      1      1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252