Sorting columns in pandas dataframe based on column name

Question

I have a dataframe with over 200 columns. The issue is as they were generated the order is

['Q1.3','Q6.1','Q1.2','Q1.1',......]

I need to sort the columns as follows:

['Q1.1','Q1.2','Q1.3',.....'Q6.1',......]

Is there some way for me to do this within Python?

The question has a banner at the top "This question already has answers here: How to change the order of DataFrame columns? (34 answers) Closed last year." The question that it is saying is the same is a totally different question and this banner and link should therefore be removed. — Joey, Aug 20 '20 at 13:20
I am voting to reopen this question, I believe it has been erroneously marked as duplicate: the supplied duplicate asks how to *reorder* columns whereas this question asks how to *sort* by column name. Strictly speaking answers to the latter are a subset of the former, but users seeking an answer to the latter are unlikely to find it in the answers to the duplicate (the highest-voted answer which mentions sorting is currently 5th in vote total). — William Miller, Feb 01 '22 at 00:59
I'm in complete agreement, the linked question is completely different. Why nobody will agree to reopen it is beyond me. — fantabolous, Jun 04 '23 at 13:06

score 624 · Accepted Answer · edited Dec 18 '18 at 22:57

624

df = df.reindex(sorted(df.columns), axis=1)

This assumes that sorting the column names will give the order you want. If your column names won't sort lexicographically (e.g., if you want column Q10.3 to appear after Q9.1), you'll need to sort differently, but that has nothing to do with pandas.

edited Dec 18 '18 at 22:57

gcamargo

3,683
4
22
34

answered Jun 16 '12 at 21:12

BrenBarn

242,874
37
412
384

8

I like this because the same method can be used to sort rows (I needed to sort rows and columns). While it's the same method, you can omit the `axis` argument (or provide its default value, `0`), like `df.reindex_axis(sorted(non_sorted_row_index))` which is equivalent to `df.reindex(sorted(non_sorted_row_index))` – Nate Anderson Nov 17 '15 at 19:57
3

Note that re-indexing is not done in-place, so to actually apply the sort to the df you have to use `df = df.reindex_axis(...)`. Also, note that non-lexicographical sorts are easy with this approach, since the list of column names can be sorted separately into an arbitrary order and then passed to `reindex_axis`. This is not possible with the alternative approach suggested by @Wes McKinney (`df = df.sort_index(axis=1)`), which is however cleaner for pure lexicographical sorts. – WhoIsJack Jan 28 '18 at 23:49
1

not sure when '.reindex_axis' was deprecated, see message below. FutureWarning: '.reindex_axis' is deprecated and will be removed in a future version. Use '.reindex' instead. This is separate from the ipykernel package so we can avoid doing imports until – CodingMatters May 08 '18 at 08:27
Does this actually sort the columns of dataframe? From first glance it seems like this would just sort the column names then reset the index. – pbreach Jul 08 '18 at 21:13
7

`reindex_axis` is deprecated and results in `FutureWarning`. However, `.reindex` works fine. For the above example, use `df.reindex(columns=sorted(df.columns))` – Logan Sep 17 '18 at 17:43
[DataFrame.reindex documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html) – sgt pepper Dec 05 '19 at 11:22
For the ones who need lexicographical sort, check naturalsort package in Python. – Mehmet Burak Sayıcı Nov 01 '20 at 20:38
2

This is a good solution, but does not work if you have duplicate column names. The answer of @Wes McKinney works in that case. Hence, I think `df.sort_index(axis=1)` is the most appropriate solution. – Hedge92 Sep 01 '21 at 08:41
for larger dfs, `reindex` is very slow – fantabolous May 31 '23 at 01:49
FYI this also works on a MultiIndex. Thanks. – Bill Aug 21 '23 at 14:52

score 470 · Answer 2 · edited Jan 21 '19 at 21:02

470

You can also do more succinctly:

df.sort_index(axis=1)

Make sure you assign the result back:

df = df.sort_index(axis=1)

Or, do it in-place:

df.sort_index(axis=1, inplace=True)

edited Jan 21 '19 at 21:02

cs95

379,657
97
704
746

answered Jul 08 '12 at 18:56

Wes McKinney

101,437
32
142
108

4

remember to do `df = df.sort_index(axis=1)`, per @multigoodverse – GoJian Jan 06 '17 at 14:59
9

or modify `df` in-place with `df.sort_index(axis=1, inplace=True)` – jkr Mar 01 '17 at 17:12
5

also, `sort_index` is faster than `reindex`, in case devs worry about it – ExtractTable.com Nov 15 '19 at 15:57
3

this should be the accepted answer. – rsmith54 Jun 09 '21 at 19:06
2

Agree with @rsmith54 as this solution also works for duplicate column names. – Hedge92 Sep 01 '21 at 08:42

score 72 · Answer 3 · edited Jan 27 '19 at 06:19

72

You can just do:

df[sorted(df.columns)]

Edit: Shorter is

df[sorted(df)]

edited Jan 27 '19 at 06:19

cs95

379,657
97
704
746

answered Jun 24 '14 at 21:22

Ivelin

12,293
5
37
35

1

I get "'DataFrame' object is not callable" for this. Version: pandas 0.14. – multigoodverse Jan 29 '15 at 10:39
@lvelin, do you know why `sorted(df)` works, is it documented somewhere? – zyxue Apr 01 '20 at 17:23
@zyxue, `sorted` will be looking for the iterative class `magic methods` to figure out what to sort. Take a look at this question https://stackoverflow.com/questions/48868228/is-there-a-magic-method-for-sorted-in-python – Ivelin Apr 02 '20 at 19:29

score 35 · Answer 4 · answered Mar 11 '16 at 05:54

For several columns, You can put columns order what you want:

#['A', 'B', 'C'] <-this is your columns order
df = df[['C', 'B', 'A']]

This example shows sorting and slicing columns:

d = {'col1':[1, 2, 3], 'col2':[4, 5, 6], 'col3':[7, 8, 9], 'col4':[17, 18, 19]}
df = pandas.DataFrame(d)

You get:

col1  col2  col3  col4
 1     4     7    17
 2     5     8    18
 3     6     9    19

Then do:

df = df[['col3', 'col2', 'col1']]

Resulting in:

col3  col2  col1
7     4     1
8     5     2
9     6     3

score 27 · Answer 5 · edited May 23 '17 at 12:34

Tweet's answer can be passed to BrenBarn's answer above with

data.reindex_axis(sorted(data.columns, key=lambda x: float(x[1:])), axis=1)

So for your example, say:

vals = randint(low=16, high=80, size=25).reshape(5,5)
cols = ['Q1.3', 'Q6.1', 'Q1.2', 'Q9.1', 'Q10.2']
data = DataFrame(vals, columns = cols)

You get:

data

    Q1.3    Q6.1    Q1.2    Q9.1    Q10.2
0   73      29      63      51      72
1   61      29      32      68      57
2   36      49      76      18      37
3   63      61      51      30      31
4   36      66      71      24      77

Then do:

data.reindex_axis(sorted(data.columns, key=lambda x: float(x[1:])), axis=1)

resulting in:

data


     Q1.2    Q1.3    Q6.1    Q9.1    Q10.2
0    2       0       1       3       4
1    7       5       6       8       9
2    2       0       1       3       4
3    2       0       1       3       4
4    2       0       1       3       4

score 21 · Answer 6 · edited Nov 05 '15 at 22:08

21

If you need an arbitrary sequence instead of sorted sequence, you could do:

sequence = ['Q1.1','Q1.2','Q1.3',.....'Q6.1',......]
your_dataframe = your_dataframe.reindex(columns=sequence)

I tested this in 2.7.10 and it worked for me.

edited Nov 05 '15 at 22:08

approxiblue

6,982
16
51
59

answered Nov 05 '15 at 21:48

M.Z

253
1
4
9

score 16 · Answer 7 · answered Dec 08 '14 at 15:33

16

Don't forget to add "inplace=True" to Wes' answer or set the result to a new DataFrame.

df.sort_index(axis=1, inplace=True)

answered Dec 08 '14 at 15:33

burkesquires

1,345
1
14
20

score 4 · Answer 8 · answered Jan 29 '15 at 12:37

4

The quickest method is:

df.sort_index(axis=1)

Be aware that this creates a new instance. Therefore you need to store the result in a new variable:

sortedDf=df.sort_index(axis=1)

answered Jan 29 '15 at 12:37

multigoodverse

7,638
19
64
106

score 1 · Answer 9 · answered Jun 16 '12 at 21:14

1

The sort method and sorted function allow you to provide a custom function to extract the key used for comparison:

>>> ls = ['Q1.3', 'Q6.1', 'Q1.2']
>>> sorted(ls, key=lambda x: float(x[1:]))
['Q1.2', 'Q1.3', 'Q6.1']

answered Jun 16 '12 at 21:14

115
2

This works for lists in general and I am familiar with it. How do I apply it to a pandas DataFrame? – pythOnometrist Jun 17 '12 at 02:24
1

Not sure, I admit my answer was not specific to this library. – tweet Jun 17 '12 at 03:04

Roko Mijic · Answer 10 · 2017-07-24T10:04:15.597

One use-case is that you have named (some of) your columns with some prefix, and you want the columns sorted with those prefixes all together and in some particular order (not alphabetical).

For example, you might start all of your features with Ft_, labels with Lbl_, etc, and you want all unprefixed columns first, then all features, then the label. You can do this with the following function (I will note a possible efficiency problem using sum to reduce lists, but this isn't an issue unless you have a LOT of columns, which I do not):

def sortedcols(df, groups = ['Ft_', 'Lbl_'] ):
    return df[ sum([list(filter(re.compile(r).search, list(df.columns).copy())) for r in (lambda l: ['^(?!(%s))' % '|'.join(l)] + ['^%s' % i  for i in l ] )(groups)   ], [])  ]

score -3 · Answer 11 · answered Jun 20 '15 at 19:58

-3

print df.sort_index(by='Frequency',ascending=False)

where by is the name of the column,if you want to sort the dataset based on column

answered Jun 20 '15 at 19:58

Aravind Krishnakumar

2,727
1
28
25

Sorting columns in pandas dataframe based on column name

11 Answers11

Linked

Related