How to sort only selected columns in pandas dataframe

Question

I want to sort some columns in large Pandas dataframe. Those columns are in the middle of df and in end. They start with "R"

columns_list = df.columns.tolist()
columns_list
Out[17]: 
['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft', 'Planet', 'ExtraterrestialSupplier', 'R5', 'R2', 'R1', 'R4', 'R3', 'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R5S3', 'R5S2', 'R5S4','R1S4']

I would like to re-order like that:

['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft', 'Planet', 'ExtraterrestialSupplier', 'R1', 'R2','R3', 'R4', 'R5', 'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R1S4', 'R5S2', 'R5S3','R5S4']

Until now I did it manually:

df= df['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft', 'Planet', 'ExtraterrestialSupplier', 'R1', 'R2','R3', 'R4', 'R5', 'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R1S4', 'R5S2', 'R5S3','R5S4']

but new input data have more R columns and in every file it is different.

I would appreciate your advice.

Does this answer your question? [How to change the order of DataFrame columns?](https://stackoverflow.com/questions/13148429/how-to-change-the-order-of-dataframe-columns) — warped, Jun 02 '20 at 20:49

score 1 · Accepted Answer · answered Jun 02 '20 at 20:53

This is surprisingly challenging. I can't find a one-liner, and the easiest I can find is:

# find the R columns
mask = df.columns.str.match('^R\d*$')

# numpy array
columns = df.columns.values

# sort the R parts
columns[mask] = sorted(columns[mask])

# assign back
df = df.reindex(columns, axis=1)

score 0 · Answer 2 · answered Jun 02 '20 at 20:53

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(5, 6), columns=['x','a','c','y','b','z'])

cols = list('xacybz')

df = DataFrame(randn(10, len(cols)), columns=cols)

preordered = list('xyz')

new_order = preordered + list(df.columns - preordered)

df.reindex(columns=new_order)

score 0 · Answer 3 · answered Jun 02 '20 at 21:03

This should work, assuming the non "R" and "S" column names are not changing. If they are, I think you would have to do a regex thing to find the names of the columns you want to sort.

I am sorting the names here by length and then alphabetically, which I think looks like how you are doing it.

new_df_columns = ['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft',
               'Planet', 'ExtraterrestialSupplier', 'R5', 'R2', 'R1', 'R4', 'R3',
               'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R5S3', 'R5S2', 'R5S4','R1S4']
df = pd.DataFrame(columns=new_df_columns)

base_columns = ['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft',
               'Planet', 'ExtraterrestialSupplier',]
extra_cols = [name for name in new_df_columns if name not in base_columns]
sorted_extra = sorted(extra_cols, key = lambda x: (len(x),x))

df = df[base_columns + sorted_extra]

How to sort only selected columns in pandas dataframe

3 Answers3