Find column with the highest value (pandas)

Question

I have a Pandas dataframe with several columns that range from 0 to 100. I would like to add a column on to the dataframe that contains the name of the column from among these that has the greatest value for each row. So:

one   two   three four  COLUMN_I_WANT_TO_CREATE
5     40    12    19    two
90    15    58    23    one
74    95    34    12    two
44    81    22    97    four
10    59    59    44    [either two or three, selected randomly]

etc.

Bonus points if the solution can resolve ties randomly.

Without the random tie resolution, I think this is clearly a duplicate of [this question](http://stackoverflow.com/questions/29919306/find-the-column-name-which-has-maximum-value-for-each-row-pandas) and probably others. Maybe we should concentrate on that. — DSM, Apr 13 '16 at 12:10
Does this answer your question? [Find the column name which has the maximum value for each row](https://stackoverflow.com/questions/29919306/find-the-column-name-which-has-the-maximum-value-for-each-row) — cottontail, Jan 11 '23 at 18:40

jezrael · Answer 1 · 2016-04-14T05:07:23.297

You can use idxmax with parameter axis=1:

print df
   one  two  three  four
0    5   40     12    19
1   90   15     58    23
2   74   95     34    12
3   44   81     22    97

df['COLUMN_I_WANT_TO_CREATE'] = df.idxmax(axis=1)
print df
   one  two  three  four COLUMN_I_WANT_TO_CREATE
0    5   40     12    19                     two
1   90   15     58    23                     one
2   74   95     34    12                     two
3   44   81     22    97                    four

With random duplicity max values is it more complicated.

You can first find all max values by x[(x == x.max())]. Then you need index values, where apply sample. But it works only with Series, so index is converted to Series by to_series. Last you can select only first value of Serie by iloc:

print df
   one  two  three  four
0    5   40     12    19
1   90   15     58    23
2   74   95     34    12
3   44   81     22    97
4   10   59     59    44
5   59   59     59    59
6   10   59     59    59
7   59   59     59    59

#first run
df['COL']=df.apply(lambda x:x[(x==x.max())].index.to_series().sample(frac=1).iloc[0], axis=1)
print df
   one  two  three  four    COL
0    5   40     12    19    two
1   90   15     58    23    one
2   74   95     34    12    two
3   44   81     22    97   four
4   10   59     59    44  three
5   59   59     59    59    one
6   10   59     59    59    two
7   59   59     59    59  three

#one of next run
df['COL']=df.apply(lambda x:x[(x==x.max())].index.to_series().sample(frac=1).iloc[0], axis=1)
print df
   one  two  three  four    COL
0    5   40     12    19    two
1   90   15     58    23    one
2   74   95     34    12    two
3   44   81     22    97   four
4   10   59     59    44    two
5   59   59     59    59    one
6   10   59     59    59  three
7   59   59     59    59   four

Is `sample` doing the randomization? I don't quite see how it does that. — futuraprime, Apr 13 '16 at 12:27
Ah—that's not quite what I meant by resolving ties randomly. (I've updated my question to clarify.) — futuraprime, Apr 13 '16 at 12:32
@jezrael: this isn't the kind of sampling the OP wants. He wants to randomize *among the maxima per row*, not *among the maxima column*. — DSM, Apr 13 '16 at 12:32

Find column with the highest value (pandas)

1 Answers1