Find the column name which has the 2nd maximum value for each row (pandas)

Question

Based on this post: Find the column name which has the maximum value for each row it is clear how to get the column name with the max value of each row using df.idxmax(axis=1).

The question is, how can I get the 2nd, 3rd and so on maximum value per row?

score 5 · Accepted Answer · answered Feb 13 '18 at 11:16

You need numpy.argsort for position and then reorder columns names by indexing:

np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,5)), columns=list('ABCDE'))
print (df)
   A  B  C  D  E
0  8  8  3  7  7
1  0  4  2  5  2
2  2  2  1  0  8
3  4  0  9  6  2
4  4  1  5  3  4

arr = np.argsort(-df.values, axis=1)
df1 = pd.DataFrame(df.columns[arr], index=df.index)
print (df1)
   0  1  2  3  4
0  A  B  D  E  C
1  D  B  C  E  A
2  E  A  B  C  D
3  C  D  A  E  B
4  C  A  E  D  B

Verify:

#first column
print (df.idxmax(axis=1))
0    A
1    D
2    E
3    C
4    C
dtype: object

#last column
print (df.idxmin(axis=1))
0    C
1    A
2    D
3    B
4    B
dtype: object

score 0 · Answer 2 · answered Feb 13 '18 at 11:41

While there is no method to find specific ranks within a row, you can rank elements in a pandas dataframe using the rank method.

For example, for a dataframe like this:

df = pd.DataFrame([[1, 2, 4],[3, 1, 7], [10, 4, 2]], columns=['A','B','C'])

>>> print(df)
      A  B  C
  0   1  2  4
  1   3  1  7
  2  10  4  2

You can get the ranks of each row by doing:

 >>> df.rank(axis=1,method='dense', ascending=False)
        A    B    C
     0  3.0  2.0  1.0
     1  2.0  3.0  1.0
     2  1.0  2.0  3.0

By default, applying rank to dataframes and using method='dense' will result in float ranks. This can be easily fixed just by doing:

 >>> ranks = df.rank(axis=1,method='dense', ascending=False).astype(int)
 >>> ranks
        A  B  C
     0  3  2  1
     1  2  3  1
     2  1  2  3

Finding the indices is a little trickier in pandas, but it can be resumed to apply a filter on a condition (i.e. ranks==2):

>>> ranks.where(ranks==2)
      A    B   C
   0  NaN  2.0 NaN
   1  2.0  NaN NaN
   2  NaN  2.0 NaN

Applying where will return only the elements matching the condition and the rest set to NaN. We can retrieve the columns and row indices by doing:

>>> ranks.where(ranks==2).notnull().values.nonzero()
    (array([0, 1, 2]), array([1, 0, 1]))

And for retrieving the column index or position within a row, which is the answer to your question:

>>> ranks.where(ranks==2).notnull().values.nonzero()[0]
    array([1, 0, 1])

For the third element you just need to change the condition in where to ranks.where(ranks==3) and so on for other ranks.

Find the column name which has the 2nd maximum value for each row (pandas)

2 Answers2