python pandas - getting a column value after running idxmax / argmax

Question

I am trying to go through some data to find which category of products had the highest revenue.

I can get the actual total revenue of the category with the highest revenue by running:

max_revenue_by_cat = summer_transactions.groupby('item_category_id')['total_sales'].sum().max()

But how do I then get what category_id that max revenue belonged to? i.e. the category_id with the highest number of total_sales

https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Brad Solomon, Dec 18 '17 at 14:56
Uh, then `df.set_index('item_category_id').total_sales.sum(level=0).sort_values().iloc[[-1]]` — cs95, Dec 18 '17 at 15:00

score 2 · Answer 1 · answered Dec 18 '17 at 15:08

Use set_index + sum(level=0) + sort_values + iloc to index the first item.

df

   item_category_id  total_sales
0                 1          100
1                 1           10
2                 0          200
3                 2           20
4                 1          300
5                 0          100
6                 1           30
7                 2          400

r = df.set_index('item_category_id')\
      .total_sales.sum(level=0)\
      .sort_values(ascending=False)\
      .iloc[[0]]

item_category_id
1    440
Name: total_sales, dtype: int64

If you want this as a mini-dataframe, call reset_index on the result -

r.reset_index()

   item_category_id  total_sales
0                 1          440

Details

df.set_index('item_category_id').total_sales.sum(level=0)

item_category_id
1    440
0    300
2    420
Name: total_sales, dtype: int64

Here, the category with the largest sum is 1. Usually, with a small number of groups, the sort_values call takes negligible time, so this should be pretty performant.

jezrael · Answer 2 · 2017-12-18T15:15:21.930

1

I think you need idxmax, but for return index add []:

summer_transactions = pd.DataFrame({'A':list('abcdef'),
                                    'total_sales':[5,3,6,9,2,4],
                                    'item_category_id':list('aaabbb')})


df = summer_transactions.groupby('item_category_id')['total_sales'].sum()

s = df.loc[[df.idxmax()]]
print (s)
item_category_id
b    15
Name: total_sales, dtype: int64


df = df.loc[[df.idxmax()]].reset_index(name='col')
print (df)
  item_category_id  col
0                b   15

edited Dec 18 '17 at 15:15

answered Dec 18 '17 at 15:02

jezrael

822,522
95
1,334
1,252

Yes. I agree, first part of answers is same. But OP need second part and it is diffetent, so I think it is OK. – jezrael Dec 18 '17 at 16:42

BENY · Accepted Answer · 2017-12-20T16:50:25.667

1

By using coldspeed's data :-)

(df.groupby('item_category_id').total_sales.sum()).loc[lambda x : x==x.max()]


Out[11]: 
item_category_id
1    440
Name: total_sales, dtype: int64

edited Dec 20 '17 at 16:50

answered Dec 18 '17 at 15:09

BENY

317,841
20
164
234

but OP need value of index and of column too ;) – jezrael Dec 18 '17 at 15:10
@Wen - What do ypu think about [this comment](https://stackoverflow.com/questions/47870988/python-pandas-getting-a-column-value-after-running-idxmax-argmax/47871136#comment82708740_47871136) ? – jezrael Dec 18 '17 at 16:43
1

@jezrael ummm, my real answer is `(df.groupby('item_category_id').total_sales.sum()).loc[lambda x : x==x.max()]` – BENY Dec 18 '17 at 16:45
@wen - I'd actually accept this as an answer if you convert it. – mheavers Dec 20 '17 at 16:40
@mheavers you mean the order ? – BENY Dec 20 '17 at 16:49

python pandas - getting a column value after running idxmax / argmax

3 Answers3