1

I have a data frame like this:

A1 A2 A3 ...A99 largest
0   3  4  6      11   11
1   1  8  2  ...  1    8
.
.
.

I created the column which contains the largest value in each row by using:

data['largest']=data.max(axis=1)

but I also want to get a column which contains the corresponding column name with the largest number, something like this:

    A1 A2 A3 ...A99 largest name
0   3  4  6      11   11    A99
1   1  8  2  ...  1    8    A2
.                            .
.                            .
.                            .

I tried '.idxmax' but gave me an error'reduction operation 'argmax' not allowed for this dtype', can someone help me? Many thanks.

Cecilia
  • 309
  • 2
  • 12

3 Answers3

3

Use DataFrame.idxmax with DataFrame.assign for add 2 columns without inference each other:

df = data.assign(largest=data.max(axis=1), name=data.idxmax(axis=1))
print (df)
   A1  A2  A3  A99  largest name
0   3   4   6   11       11  A99
1   1   8   2    1        8   A2

Or DataFrame.agg:

data[['largest','name']] = data.agg(['max','idxmax'], 1)
print (data)
   A1  A2  A3  A99 largest name
0   3   4   6   11      11  A99
1   1   8   2    1       8   A2

EDIT:

You can select only numeric columns:

df1 = data.select_dtypes(np.number)

Or convert columns to numeric:

df1 = data.astype(int)

If not working .astype because possible some non numeric value use to_numeric with errors='coerce' for convert problematic values no NaN:

df1 = data.apply(lambda x: pd.to_numeric(x, errors='coerce'))

df = data.assign(largest=df1.max(axis=1), name=df1.idxmax(axis=1))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks for the answer, I actually tried '.idxmax' but gave me an error'reduction operation 'argmax' not allowed for this dtype', I have no idea why this showed up because I checked the datatype, it's dataframe type – Cecilia Jul 08 '19 at 11:49
  • @Cecilia - it seems some columns are non numeric, give me a sec. – jezrael Jul 08 '19 at 11:51
1

Here's one approach using dot to keep the column name where a value is equal to largest:

df['name'] = df.iloc[:,:-1].eq(df.largest.values[:,None]).dot(df.columns[:-1])

   A1  A2  A3  A99  largest name
0   3   4   6   11       11  A99
1   1   8   2    1        8   A2
yatu
  • 86,083
  • 12
  • 84
  • 139
1

Using np.argmax():

df=df.assign(name=df.columns[np.argmax(df.values,axis=1)])

   A1  A2  A3  A99  largest name
0   3   4   6   11       11  A99
1   1   8   2    1        8   A2
anky
  • 74,114
  • 11
  • 41
  • 70