How to add a column that contains the corresponding column name with the largest number in Python?

Question

I have a data frame like this:

A1 A2 A3 ...A99 largest
0   3  4  6      11   11
1   1  8  2  ...  1    8
.
.
.

I created the column which contains the largest value in each row by using:

data['largest']=data.max(axis=1)

but I also want to get a column which contains the corresponding column name with the largest number, something like this:

    A1 A2 A3 ...A99 largest name
0   3  4  6      11   11    A99
1   1  8  2  ...  1    8    A2
.                            .
.                            .
.                            .

I tried '.idxmax' but gave me an error'reduction operation 'argmax' not allowed for this dtype', can someone help me? Many thanks.

jezrael · Accepted Answer · 2019-07-08T11:52:20.887

Use DataFrame.idxmax with DataFrame.assign for add 2 columns without inference each other:

df = data.assign(largest=data.max(axis=1), name=data.idxmax(axis=1))
print (df)
   A1  A2  A3  A99  largest name
0   3   4   6   11       11  A99
1   1   8   2    1        8   A2

Or DataFrame.agg:

data[['largest','name']] = data.agg(['max','idxmax'], 1)
print (data)
   A1  A2  A3  A99 largest name
0   3   4   6   11      11  A99
1   1   8   2    1       8   A2

EDIT:

You can select only numeric columns:

df1 = data.select_dtypes(np.number)

Or convert columns to numeric:

df1 = data.astype(int)

If not working .astype because possible some non numeric value use to_numeric with errors='coerce' for convert problematic values no NaN:

df1 = data.apply(lambda x: pd.to_numeric(x, errors='coerce'))

df = data.assign(largest=df1.max(axis=1), name=df1.idxmax(axis=1))

Thanks for the answer, I actually tried '.idxmax' but gave me an error'reduction operation 'argmax' not allowed for this dtype', I have no idea why this showed up because I checked the datatype, it's dataframe type — Cecilia, Jul 08 '19 at 11:49
@Cecilia - it seems some columns are non numeric, give me a sec. — jezrael, Jul 08 '19 at 11:51

yatu · Answer 2 · 2019-07-08T11:33:33.317

1

Here's one approach using dot to keep the column name where a value is equal to largest:

df['name'] = df.iloc[:,:-1].eq(df.largest.values[:,None]).dot(df.columns[:-1])

   A1  A2  A3  A99  largest name
0   3   4   6   11       11  A99
1   1   8   2    1        8   A2

edited Jul 08 '19 at 11:33

answered Jul 08 '19 at 11:28

yatu

86,083
12
84
139

I got an error 'shapes (100,)' and (50,), not aligned: 100(dim 0)!=50(dim 0) – Cecilia Jul 08 '19 at 11:56

score 1 · Answer 3 · answered Jul 08 '19 at 11:28

1

Using np.argmax():

df=df.assign(name=df.columns[np.argmax(df.values,axis=1)])

   A1  A2  A3  A99  largest name
0   3   4   6   11       11  A99
1   1   8   2    1        8   A2

answered Jul 08 '19 at 11:28

anky

74,114
11
41
70

How to add a column that contains the corresponding column name with the largest number in Python?

3 Answers3