Pandas column: List of columns in specific order

Question

I have a dataframe, but I'm trying to add a new column which is a list of the column names in order of their values, for each row.

Searching has proved to be difficult, as the search terms have so much in common with doing a column sort overall. Instead, I'm trying to customize the list for each row.

df = pd.DataFrame([
  ["a",88,3,78,8,40  ],
  ["b",100,20,29,13,91  ],
  ["c",77,92,42,72,58  ],
  ["d",39,53,69,7,40  ],
  ["e",26,62,77,33,86  ],
  ["f",94,5,28,96,7  ]
], columns=['id','x1','x2','x3','x4','x5'])

have = df.set_index('id')



+----+-----+----+----+----+----+----------------------------+
| id | x1  | x2 | x3 | x4 | x5 |        ordered_cols        |
+----+-----+----+----+----+----+----------------------------+
| a  |  88 |  3 | 78 |  8 | 40 | ['x2','x4','x5','x3','x1'] |
| b  | 100 | 20 | 29 | 13 | 91 | ['x4','x2','x3','x5','x1'] |
| c  |  77 | 92 | 42 | 72 | 58 | …                          |
| d  |  39 | 53 | 69 |  7 | 40 | …                          |
| e  |  26 | 62 | 77 | 33 | 86 | …                          |
| f  |  94 |  5 | 28 | 96 |  7 | …                          |
+----+-----+----+----+----+----+----------------------------+

score 2 · Answer 1 · answered Dec 23 '20 at 20:40

try stack with sort_values and groupby

assuming your dataframe is called df

df["sorted_cols"] = (
    df.stack().sort_values().reset_index(1).groupby(level=0)["level_1"].agg(list)
)

print(df)

     x1  x2  x3  x4  x5           sorted_cols
id                                           
a    88   3  78   8  40  [x2, x4, x5, x3, x1]
b   100  20  29  13  91  [x4, x2, x3, x5, x1]
c    77  92  42  72  58  [x3, x5, x4, x1, x2]
d    39  53  69   7  40  [x4, x1, x5, x2, x3]
e    26  62  77  33  86  [x1, x4, x2, x3, x5]
f    94   5  28  96   7  [x2, x5, x3, x1, x4]

Ismael EL ATIFI · Answer 2 · 2020-12-23T20:32:11.857

1

Here is a simple one line solution using apply and np.argsort :

import numpy as np

have["ordered_cols"] = have.apply(lambda row: have.columns[np.argsort(row.values)].values, axis=1)
have

edited Dec 23 '20 at 20:32

answered Dec 23 '20 at 20:26

Ismael EL ATIFI

1,939
20
16

Trenton McKinney · Accepted Answer · 2020-12-23T20:43:53.607

The solution by Manakin will be the fastest option, because it is a vectorized.
Use pandas.DataFrame.apply with axis=1, and a list comprehension to sort the column names by the row values.
The list comprehension is from SO: Sorting list based on values from another list, and does not require importing any additional packages.

import pandas as pd

# add the new column 
df['ordered_cols'] = df.apply(lambda y: [x for _, x in sorted(zip(y, df.columns))], axis=1)

# display(df)
     x1  x2  x3  x4  x5          ordered_cols
id                                           
a    88   3  78   8  40  [x2, x4, x5, x3, x1]
b   100  20  29  13  91  [x4, x2, x3, x5, x1]
c    77  92  42  72  58  [x3, x5, x4, x1, x2]
d    39  53  69   7  40  [x4, x1, x5, x2, x3]
e    26  62  77  33  86  [x1, x4, x2, x3, x5]
f    94   5  28  96   7  [x2, x5, x3, x1, x4]

score -1 · Answer 4 · answered Dec 23 '20 at 20:23

Hay,

you can try looping over the rows and sorting the values in each row. The code below will do the trick:

ordered_cols = []
for index, row in have.iterrows():
    ordered_cols.append(list(have.sort_values(by=index, ascending=True, axis=1).columns))
have['ordered_cols'] = ordered_cols
have

Output:

x1  x2  x3  x4  x5  ordered_cols
id                      
a   88  3   78  8   40  [x2, x4, x5, x3, x1]
b   100     20  29  13  91  [x4, x2, x3, x5, x1]
c   77  92  42  72  58  [x3, x5, x4, x1, x2]
d   39  53  69  7   40  [x4, x1, x5, x2, x3]
e   26  62  77  33  86  [x1, x4, x2, x3, x5]
f   94  5   28  96  7   [x2, x5, x3, x1, x4]

I hope this was helpful.

Cheers!

We should avoid for loops when possible and above all iterrows which is very slow (itertuples is much faster); — Ismael EL ATIFI, Dec 23 '20 at 20:33

Pandas column: List of columns in specific order

4 Answers4