min() operation on nested groupby in pandas

Question

I am just getting to know pandas and I can't get over a conceptual problem. My dataframe is as follows:

df=pd.DataFrame({'ANIMAL':[1,1,1,1,1,2,2,2],
            'AGE_D' : [3,6,47,377,698,1,9,241],
            'AGE_Y' : [1,1,1,2,2,1,1,1]})

I would like to do a nested group within animal and age_y and then select the min on the subgroup. Desired output would be then:

ANIMAL  AGE_Y   AGE_D
1       1       3
1       2       377
2       1       1

I can do this without nesting within animal, e.g. if my df2 = subset for ANIMAL=1 then

df2.loc[df2.groupby('AGE_Y')['AGE_D'].idxmin()]

But all the things I tried with nesting the animal in the group by were unsuccesful. I am guessing that my order of the operations is wrong... How should I go about this?

Dupe: http://stackoverflow.com/questions/23394476/keep-other-columns-when-using-min-with-groupby basically the only difference is you're wanting to group on multiple columns — EdChum, Nov 08 '16 at 15:49
Having seen the solution I can see how this is similar, but unless you know it, it's not so easy to deduce. I was actually looking at the thread you posted, but was stumbling with the nested grouping. — branwen85, Nov 08 '16 at 16:10
Basically the thing to note here that you can group on multiple columns which looks like the step you were missing here — EdChum, Nov 08 '16 at 16:48

score 3 · Accepted Answer · answered Nov 08 '16 at 15:47

3

I think you need add columns to groupby - group by columns ANIMAL and AGE_Y:

df = df2.loc[df2.groupby(['ANIMAL','AGE_Y'])['AGE_D'].idxmin()]
df = df[['ANIMAL','AGE_Y','AGE_D']]
print (df)
   ANIMAL  AGE_Y  AGE_D
0       1      1      3
3       1      2    377
5       2      1      1

answered Nov 08 '16 at 15:47

jezrael

822,522
95
1,334
1,252

Thank you so much! I think I was really close (and yet so far...). This works a treat. – branwen85 Nov 08 '16 at 16:08

min() operation on nested groupby in pandas

1 Answers1

Linked