5

I have a Pandas dataframe with several columns that range from 0 to 100. I would like to add a column on to the dataframe that contains the name of the column from among these that has the greatest value for each row. So:

one   two   three four  COLUMN_I_WANT_TO_CREATE
5     40    12    19    two
90    15    58    23    one
74    95    34    12    two
44    81    22    97    four
10    59    59    44    [either two or three, selected randomly]

etc.

Bonus points if the solution can resolve ties randomly.

futuraprime
  • 5,468
  • 7
  • 38
  • 58
  • 1
    Without the random tie resolution, I think this is clearly a duplicate of [this question](http://stackoverflow.com/questions/29919306/find-the-column-name-which-has-maximum-value-for-each-row-pandas) and probably others. Maybe we should concentrate on that. – DSM Apr 13 '16 at 12:10
  • Yes, you're right—wish I'd found that one in my search. – futuraprime Apr 13 '16 at 12:26
  • Does this answer your question? [Find the column name which has the maximum value for each row](https://stackoverflow.com/questions/29919306/find-the-column-name-which-has-the-maximum-value-for-each-row) – cottontail Jan 11 '23 at 18:40

1 Answers1

7

You can use idxmax with parameter axis=1:

print df
   one  two  three  four
0    5   40     12    19
1   90   15     58    23
2   74   95     34    12
3   44   81     22    97

df['COLUMN_I_WANT_TO_CREATE'] = df.idxmax(axis=1)
print df
   one  two  three  four COLUMN_I_WANT_TO_CREATE
0    5   40     12    19                     two
1   90   15     58    23                     one
2   74   95     34    12                     two
3   44   81     22    97                    four

With random duplicity max values is it more complicated.

You can first find all max values by x[(x == x.max())]. Then you need index values, where apply sample. But it works only with Series, so index is converted to Series by to_series. Last you can select only first value of Serie by iloc:

print df
   one  two  three  four
0    5   40     12    19
1   90   15     58    23
2   74   95     34    12
3   44   81     22    97
4   10   59     59    44
5   59   59     59    59
6   10   59     59    59
7   59   59     59    59
#first run
df['COL']=df.apply(lambda x:x[(x==x.max())].index.to_series().sample(frac=1).iloc[0], axis=1)
print df
   one  two  three  four    COL
0    5   40     12    19    two
1   90   15     58    23    one
2   74   95     34    12    two
3   44   81     22    97   four
4   10   59     59    44  three
5   59   59     59    59    one
6   10   59     59    59    two
7   59   59     59    59  three

#one of next run
df['COL']=df.apply(lambda x:x[(x==x.max())].index.to_series().sample(frac=1).iloc[0], axis=1)
print df
   one  two  three  four    COL
0    5   40     12    19    two
1   90   15     58    23    one
2   74   95     34    12    two
3   44   81     22    97   four
4   10   59     59    44    two
5   59   59     59    59    one
6   10   59     59    59  three
7   59   59     59    59   four
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252