Choose Highest Value Column

Question

I need your help. A dataframe stores the probabilities of three categories, as follows:

dict_test = {'series': [1, 2, 3, 4, 5, 6, 7],
              'cat_1': [.02, .02, .81, .72, .01, .3, .45],
              'cat_2': [.02, .02, .14, .2, .99, .45, .4],
              'cat_3': [.96, .96, .05, .08, .00, .25, .15]}

import pandas as pd
df = pd.DataFrame(dict_test)

I need to create a new column to store which category has the highest probability. What I've been able to do so far is select the highest probability using the agg function:

df['choice'] = df.drop('series', axis = 1).agg(max, axis = 1)

The result I need is exemplified with this dataframe:

dict_test = {'series': [1, 2, 3, 4, 5, 6, 7],
              'cat_1': [.02, .02, .81, .72, .01, .3, .45],
              'cat_2': [.02, .02, .14, .2, .99, .45, .4],
              'cat_3': [.96, .96, .05, .08, .00, .25, .15],
             'result': ['cat_3', 'cat_3', 'cat_1', 'cat_1', 'cat_2', 'cat_2', 'cat_1']}

df = pd.DataFrame(dict_test)

Any suggestion?

Adrien Matissart · Accepted Answer · 2019-09-28T21:34:28.427

2

You are looking for idxmax

df['result'] = df.filter(regex='^cat').idxmax(axis=1)

edited Sep 28 '19 at 21:34

answered Sep 28 '19 at 21:04

Adrien Matissart

1,610
15
19

Why use a filter? I didn't understand. – Ângelo Sep 28 '19 at 21:09
`agg` needs to be applied on the 3 `cat_*` columns only. In your example you used `drop` which is equivalent, but it seems to me that `filter` is more practical if other columns are present in real data. – Adrien Matissart Sep 28 '19 at 21:13
2

No need for `agg`, just use `.idxmax(axis=1)` – Erfan Sep 28 '19 at 21:29
good point, thanks – Adrien Matissart Sep 28 '19 at 21:37

Choose Highest Value Column

1 Answers1