Python Pandas: Sorting Columns

Question

I'm trying to sort the columns of a .csv file. These are the names and the order of the columns:

'Unnamed: 0', 'Unnamed: 1', 
'25Mg BLK', '25Mg 1', '25Mg 2', 
'44Ca BLK', '44Ca 1', '44Ca 2', 
'137Ba BLK', '137Ba 1', '137Ba 2', 
'25Mg 3', '25Mg 4', '25Mg 5', 
'44Ca 3', '44Ca 4', 44Ca 5', 
'137Ba 3', '137Ba 4', '137Ba 5',

This is the order I would like to have:

'Unnamed: 0', 'Unnamed: 1', 
'25Mg BLK', '25Mg 1', '25Mg 2', '25Mg 3', '25Mg 4', '25Mg 5',
'44Ca BLK', '44Ca 1', '44Ca 2', '44Ca 3', '44Ca 4', 44Ca 5',
'137Ba BLK', '137Ba 1', '137Ba 2', '137Ba 3', '137Ba 4', '137Ba 5',

Currently my code looks like this:

import pandas as pd

df = pd.read_csv("real_data.csv", header=2)

df2 = df.reindex_axis(sorted(df.columns), axis=1)

print(df2)

df2.to_csv("sorted.csv")

With my current code I get the following result for the order of the columns:

'137Ba 1', '137Ba 2', '137Ba 3', '137Ba 4', '137Ba 5', '137Ba BLK',
'25Mg 1', '25Mg 2', '25Mg 3', '25Mg 4', '25Mg 5', '25Mg BLK', 
'44Ca 1', '44Ca 2', '44Ca 3', '44Ca 4', '44Ca 5', '44Ca BLK'

So I already figured out that I have to pass a function to the sorted function to specify how I want it to sort it, but I can't figure out a function which would do that.

Any input is highly appreciated!

Can you explain the logic behind your sorting more? Why does `137Ba BLK` come before `137Ba 1`? Unless you specify a clear sorting logic, it's hard for us (or for you) to write a good sorting function. — ASGM, Oct 30 '17 at 14:36
The file is the output of a device which measures different isotopes. Here 137Ba is the specific isotope. BLK stands for blank or background value and 1,2,3,... is series of measurements for that isotope. — qawert, Oct 30 '17 at 15:02

jezrael · Accepted Answer · 2017-10-30T15:06:31.630

3

Use helper DataFrame, sort columns and then reindex by a.index:

c = df.columns
a = c[2:].to_series().str.extract('(\d+)([a-zA-Z]+)\s+(\d*)', expand=True)
#convert ints
a[0] = a[0].astype(int)
#convert to floats, non exis numbers generate NaNs
a[2] = pd.to_numeric(a[2], errors='coerce')
a = a.sort_values([0,1,2], na_position='first')
print (a)
             0   1    2
25Mg BLK    25  Mg  NaN
25Mg 1      25  Mg  1.0
25Mg 2      25  Mg  2.0
25Mg 3      25  Mg  3.0
25Mg 4      25  Mg  4.0
25Mg 5      25  Mg  5.0
44Ca BLK    44  Ca  NaN
44Ca 1      44  Ca  1.0
44Ca 2      44  Ca  2.0
44Ca 3      44  Ca  3.0
44Ca 4      44  Ca  4.0
44Ca 5      44  Ca  5.0
137Ba BLK  137  Ba  NaN
137Ba 1    137  Ba  1.0
137Ba 2    137  Ba  2.0
137Ba 3    137  Ba  3.0
137Ba 4    137  Ba  4.0
137Ba 5    137  Ba  5.0

df = df.reindex_axis(c[:2].tolist() + a.index.tolist(), axis=1)
print (df)

edited Oct 30 '17 at 15:06

answered Oct 30 '17 at 14:42

jezrael

822,522
95
1,334
1,252

oops, I forget for it, need `c[:2].tolist() + a.index.tolist()` – jezrael Oct 30 '17 at 15:06
Thanks for your respons! a = c[2:].to_series().str.extract('(\d+)([a-zA-Z]+)\s+(\d*)', expand=True) What is the c in this line? – qawert Oct 30 '17 at 15:06
`c = df.columns` – jezrael Oct 30 '17 at 15:06
Works exactly the way I wanted! Thanks a lot! – qawert Oct 30 '17 at 15:17

score 1 · Answer 2 · answered Oct 30 '17 at 14:43

1

See this answer here: https://stackoverflow.com/a/33555435/8239103 It seems to do what you want. For clarity I'll post the code here.

sequence = [Your sequence as a list as above]
your_dataframe = your_dataframe.reindex(columns=sequence)

answered Oct 30 '17 at 14:43

Keith Cargill

82
4

Thanks for your response. I would like to have a program which sorts the column without any input, as the files I'm working with may have different numbers of elements. – qawert Oct 30 '17 at 15:03

BENY · Answer 3 · 2017-10-30T15:04:09.323

from natsort import natsorted, ns

l1=list(map(lambda x: x.replace('BLK', '0000000'), l1))
l1=natsorted(l1)
l1=list(map(lambda x: x.replace('0000000', 'BLK'), l1))

l1
Out[1125]: 
['25Mg BLK',
 '25Mg 1',
 '25Mg 2',
 '25Mg 3',
 '25Mg 4',
 '25Mg 5',
 '44Ca BLK',
 '44Ca 1',
 '44Ca 2',
 '44Ca 3',
 '44Ca 4',
 '44Ca 5',
 '137Ba BLK',
 '137Ba 1',
 '137Ba 2',
 '137Ba 3',
 '137Ba 4',
 '137Ba 5']

Then doing df.reindex(l1)

Python Pandas: Sorting Columns

3 Answers3