0

Having df of probabilities distribution, I get max probability for rows with df.idxmax(axis=1) like this:

df['1k-th'] = df.idxmax(axis=1)

and get the following result:

(scroll the tables to the right if you can not see all the columns)

    0           1           2           3           4           5           6           1k-th
0   0.114869    0.020708    0.025587    0.028741    0.031257    0.031619    0.747219    6
1   0.020206    0.012710    0.010341    0.012196    0.812495    0.113863    0.018190    4
2   0.023585    0.735475    0.091795    0.021683    0.027581    0.054217    0.045664    1
3   0.009834    0.009175    0.013165    0.016014    0.015507    0.899115    0.037190    5
4   0.023357    0.736059    0.088721    0.021626    0.027341    0.056289    0.046607    1

the question is how to get the 2-th, 3th, etc probabilities, so that I get the following result?:

    0           1           2           3           4           5           6           1k-th   2-th
0   0.114869    0.020708    0.025587    0.028741    0.031257    0.031619    0.747219    6       0
1   0.020206    0.012710    0.010341    0.012196    0.812495    0.113863    0.018190    4       3
2   0.023585    0.735475    0.091795    0.021683    0.027581    0.054217    0.045664    1       4
3   0.009834    0.009175    0.013165    0.016014    0.015507    0.899115    0.037190    5       4
4   0.023357    0.736059    0.088721    0.021626    0.027341    0.056289    0.046607    1       2

Thank you!

Dmitriy Grankin
  • 568
  • 9
  • 21
  • Your question has already been answered [here](https://stackoverflow.com/questions/39066260/get-first-and-second-highest-values-in-pandas-columns) – Hemant Rakesh Apr 07 '21 at 12:52

1 Answers1

0

My own solution is not the prettiest, but does it's job and works fast:

for i in range(7):
    p[f'{i}k'] = p[[0,1,2,3,4,5,6]].idxmax(axis=1)
    p[f'{i}k_v'] = p[[0,1,2,3,4,5,6]].max(axis=1)

    for x in range(7):
        p[x] = np.where(p[x]==p[f'{i}k_v'], np.nan, p[x])

The loop does:

  • finds the largest value and it's column index
  • drops the found value (sets to nan) again
  • finds the 2nd largest value
  • drops the found value
  • etc ...
Dmitriy Grankin
  • 568
  • 9
  • 21