2

I have the following dataframe:

d = {'item_code': ['Item1', 'Item2', 'Item3', 'Item1', 'Item2', 'Item2', 'Item2', 'Item1', 'Item3'], 'dollar_amount': ['200.25', '350.00', '120.00', '400.50', '1231.25', '700.00', '350.00', '200.25', '2340.00'], 'year': ['2010','2010','2010','2011','2011','2012','2012','2012','2012']}
df = pd.DataFrame(data=d)
df

    dollar_amount   item_code   year
0   200.25          Item1       2010
1   350.00          Item2       2010
2   120.00          Item3       2010
3   400.50          Item1       2011
4   1231.25         Item1       2011
5   700.00          Item2       2012
6   350.00          Item2       2012
7   200.25          Item1       2012
8   2340.00         Item3       2012

This is what I currently have:

df['dollar_amount'] = df['dollar_amount'].astype(float)
df.groupby('year', as_index=False).agg({'dollar_amount':'max'})

It gives me the following result:

    year    dollar_amount
0   2010    350.0000
1   2011    1,231.2500
2   2012    2,340.0000

I want to groupby year and find the maximum dollar_amount in that year and also get the corresponding item_code for that maximum dollar_amount.

So the expected outcome is something like this:

    year    dollar_amount   item_code
0   2010    350.0000        item2
1   2011    1,231.2500      item2
2   2012    2,340.0000      item3
el323
  • 2,760
  • 10
  • 45
  • 80
  • pd.merge(df.groupby('year', as_index=False).agg({'dollar_amount':'max'}), df, on=['year','dollar_amount'],how='left', sort=False); – Poonam Jan 05 '18 at 10:36

0 Answers0