2

I have a dataframe, but I'm trying to add a new column which is a list of the column names in order of their values, for each row.

Searching has proved to be difficult, as the search terms have so much in common with doing a column sort overall. Instead, I'm trying to customize the list for each row.

df = pd.DataFrame([
  ["a",88,3,78,8,40  ],
  ["b",100,20,29,13,91  ],
  ["c",77,92,42,72,58  ],
  ["d",39,53,69,7,40  ],
  ["e",26,62,77,33,86  ],
  ["f",94,5,28,96,7  ]
], columns=['id','x1','x2','x3','x4','x5'])

have = df.set_index('id')


+----+-----+----+----+----+----+----------------------------+
| id | x1  | x2 | x3 | x4 | x5 |        ordered_cols        |
+----+-----+----+----+----+----+----------------------------+
| a  |  88 |  3 | 78 |  8 | 40 | ['x2','x4','x5','x3','x1'] |
| b  | 100 | 20 | 29 | 13 | 91 | ['x4','x2','x3','x5','x1'] |
| c  |  77 | 92 | 42 | 72 | 58 | …                          |
| d  |  39 | 53 | 69 |  7 | 40 | …                          |
| e  |  26 | 62 | 77 | 33 | 86 | …                          |
| f  |  94 |  5 | 28 | 96 |  7 | …                          |
+----+-----+----+----+----+----+----------------------------+



Umar.H
  • 22,559
  • 7
  • 39
  • 74
Josh
  • 1,493
  • 1
  • 13
  • 24

4 Answers4

2

try stack with sort_values and groupby

assuming your dataframe is called df

df["sorted_cols"] = (
    df.stack().sort_values().reset_index(1).groupby(level=0)["level_1"].agg(list)
)

print(df)

     x1  x2  x3  x4  x5           sorted_cols
id                                           
a    88   3  78   8  40  [x2, x4, x5, x3, x1]
b   100  20  29  13  91  [x4, x2, x3, x5, x1]
c    77  92  42  72  58  [x3, x5, x4, x1, x2]
d    39  53  69   7  40  [x4, x1, x5, x2, x3]
e    26  62  77  33  86  [x1, x4, x2, x3, x5]
f    94   5  28  96   7  [x2, x5, x3, x1, x4]
Umar.H
  • 22,559
  • 7
  • 39
  • 74
1

Here is a simple one line solution using apply and np.argsort :

import numpy as np

have["ordered_cols"] = have.apply(lambda row: have.columns[np.argsort(row.values)].values, axis=1)
have

enter image description here

Ismael EL ATIFI
  • 1,939
  • 20
  • 16
1
import pandas as pd

# add the new column 
df['ordered_cols'] = df.apply(lambda y: [x for _, x in sorted(zip(y, df.columns))], axis=1)

# display(df)
     x1  x2  x3  x4  x5          ordered_cols
id                                           
a    88   3  78   8  40  [x2, x4, x5, x3, x1]
b   100  20  29  13  91  [x4, x2, x3, x5, x1]
c    77  92  42  72  58  [x3, x5, x4, x1, x2]
d    39  53  69   7  40  [x4, x1, x5, x2, x3]
e    26  62  77  33  86  [x1, x4, x2, x3, x5]
f    94   5  28  96   7  [x2, x5, x3, x1, x4]
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
-1

Hay,

you can try looping over the rows and sorting the values in each row. The code below will do the trick:

ordered_cols = []
for index, row in have.iterrows():
    ordered_cols.append(list(have.sort_values(by=index, ascending=True, axis=1).columns))
have['ordered_cols'] = ordered_cols
have

Output:

x1  x2  x3  x4  x5  ordered_cols
id                      
a   88  3   78  8   40  [x2, x4, x5, x3, x1]
b   100     20  29  13  91  [x4, x2, x3, x5, x1]
c   77  92  42  72  58  [x3, x5, x4, x1, x2]
d   39  53  69  7   40  [x4, x1, x5, x2, x3]
e   26  62  77  33  86  [x1, x4, x2, x3, x5]
f   94  5   28  96  7   [x2, x5, x3, x1, x4]

I hope this was helpful.

Cheers!

coco18
  • 836
  • 8
  • 18