Groupby without loosing a column

Question

I'm having an issue with a pandas dataframe. I have a dataframe with three columns , the first 2 are identifiers (str), and the third is a number.

I would like to group it so that i get the first column the third as a max, and the second column which index corresponding to the third.

That's not quite clear so let's give an example. My dataframe looks like:

    id1              id2                amount
0   first_person     first_category     18
1   first_person     second_category    37
2   second_person    first_category     229
3   second_person    third_category     23

The code for it if you need:

df = pd.DataFrame([['first_person','first_category',18],['first_person','second_category',37],['second_person','first_category',229],['second_person','third_category',23]],columns = ['id1','id2','amount'])

And I would like to get:

    id1              id2                amount
0   first_person     second_category    37
1   second_person    third_category     229

I have tried a groupby method, but it makes me loose the second column:

result = df.groupby(['id1'],as_index=False).agg({'amount':np.max})

`df.groupby(['id1'],as_index=False).max()` - is that what you want? — MaxU - stand with Ukraine, Apr 26 '16 at 09:32
The thing is, it's not everytime the last category that corresponds to the biggest amount (*edited my post to make it clear) — ysearka, Apr 26 '16 at 09:35
@MaxU thought it'd be that first too, but it returns maximum values of both `id2` and `amount`, not the row with maximum of `amount`. — Ilja Everilä, Apr 26 '16 at 09:36
but you have to define rules - which aggregate function to apply on `id2` column — MaxU - stand with Ukraine, Apr 26 '16 at 09:36
I want for every person the category in which amount is maximum. (and the corresponding amount as well) — ysearka, Apr 26 '16 at 09:39
http://stackoverflow.com/questions/15705630/python-getting-the-row-which-has-the-max-value-in-groups-using-groupby — Ilja Everilä, Apr 26 '16 at 09:44

EdChum · Accepted Answer · 2016-04-26T10:58:40.687

2

IIUC you want to groupby on 'id1' and determine the row with the largest amount using idxmax and use this to index into your original df:

In [9]:
df.loc[df.groupby('id1')['amount'].idxmax()]

Out[9]:
             id1              id2  amount
1   first_person  second_category      37
2  second_person   first_category     229

edited Apr 26 '16 at 10:58

answered Apr 26 '16 at 09:50

EdChum

376,765
198
813
562

Groupby without loosing a column

1 Answers1