3

I want to sort some columns in large Pandas dataframe. Those columns are in the middle of df and in end. They start with "R"

columns_list = df.columns.tolist()
columns_list
Out[17]: 
['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft', 'Planet', 'ExtraterrestialSupplier', 'R5', 'R2', 'R1', 'R4', 'R3', 'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R5S3', 'R5S2', 'R5S4','R1S4']

I would like to re-order like that:

['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft', 'Planet', 'ExtraterrestialSupplier', 'R1', 'R2','R3', 'R4', 'R5', 'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R1S4', 'R5S2', 'R5S3','R5S4']

Until now I did it manually:

df= df['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft', 'Planet', 'ExtraterrestialSupplier', 'R1', 'R2','R3', 'R4', 'R5', 'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R1S4', 'R5S2', 'R5S3','R5S4']

but new input data have more R columns and in every file it is different.

I would appreciate your advice.

Ellla
  • 41
  • 3
  • Does this answer your question? [How to change the order of DataFrame columns?](https://stackoverflow.com/questions/13148429/how-to-change-the-order-of-dataframe-columns) – warped Jun 02 '20 at 20:49

3 Answers3

1

This is surprisingly challenging. I can't find a one-liner, and the easiest I can find is:

# find the R columns
mask = df.columns.str.match('^R\d*$')

# numpy array
columns = df.columns.values

# sort the R parts
columns[mask] = sorted(columns[mask])

# assign back
df = df.reindex(columns, axis=1)
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
0
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(5, 6), columns=['x','a','c','y','b','z'])

cols = list('xacybz')

df = DataFrame(randn(10, len(cols)), columns=cols)

preordered = list('xyz')

new_order = preordered + list(df.columns - preordered)

df.reindex(columns=new_order)
Shamsul Masum
  • 337
  • 1
  • 2
  • 14
0

This should work, assuming the non "R" and "S" column names are not changing. If they are, I think you would have to do a regex thing to find the names of the columns you want to sort.

I am sorting the names here by length and then alphabetically, which I think looks like how you are doing it.

new_df_columns = ['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft',
               'Planet', 'ExtraterrestialSupplier', 'R5', 'R2', 'R1', 'R4', 'R3',
               'S3', 'S2', 'S4', 'S1', 'S6', 'S5', 'R5S3', 'R5S2', 'R5S4','R1S4']
df = pd.DataFrame(columns=new_df_columns)

base_columns = ['Id', 'Name', 'Surname', 'Radius', 'Ship', 'Country', 'Spacecraft',
               'Planet', 'ExtraterrestialSupplier',]
extra_cols = [name for name in new_df_columns if name not in base_columns]
sorted_extra = sorted(extra_cols, key = lambda x: (len(x),x))

df = df[base_columns + sorted_extra]
Tom
  • 8,310
  • 2
  • 16
  • 36