4

I'm trying to sort the columns of a .csv file. These are the names and the order of the columns:

'Unnamed: 0', 'Unnamed: 1', 
'25Mg BLK', '25Mg 1', '25Mg 2', 
'44Ca BLK', '44Ca 1', '44Ca 2', 
'137Ba BLK', '137Ba 1', '137Ba 2', 
'25Mg 3', '25Mg 4', '25Mg 5', 
'44Ca 3', '44Ca 4', 44Ca 5', 
'137Ba 3', '137Ba 4', '137Ba 5',

This is the order I would like to have:

'Unnamed: 0', 'Unnamed: 1', 
'25Mg BLK', '25Mg 1', '25Mg 2', '25Mg 3', '25Mg 4', '25Mg 5',
'44Ca BLK', '44Ca 1', '44Ca 2', '44Ca 3', '44Ca 4', 44Ca 5',
'137Ba BLK', '137Ba 1', '137Ba 2', '137Ba 3', '137Ba 4', '137Ba 5',

Currently my code looks like this:

import pandas as pd

df = pd.read_csv("real_data.csv", header=2)

df2 = df.reindex_axis(sorted(df.columns), axis=1)

print(df2)

df2.to_csv("sorted.csv")

With my current code I get the following result for the order of the columns:

'137Ba 1', '137Ba 2', '137Ba 3', '137Ba 4', '137Ba 5', '137Ba BLK',
'25Mg 1', '25Mg 2', '25Mg 3', '25Mg 4', '25Mg 5', '25Mg BLK', 
'44Ca 1', '44Ca 2', '44Ca 3', '44Ca 4', '44Ca 5', '44Ca BLK'

So I already figured out that I have to pass a function to the sorted function to specify how I want it to sort it, but I can't figure out a function which would do that.

Any input is highly appreciated!

qawert
  • 91
  • 7
  • Can you explain the logic behind your sorting more? Why does `137Ba BLK` come before `137Ba 1`? Unless you specify a clear sorting logic, it's hard for us (or for you) to write a good sorting function. – ASGM Oct 30 '17 at 14:36
  • The file is the output of a device which measures different isotopes. Here 137Ba is the specific isotope. BLK stands for blank or background value and 1,2,3,... is series of measurements for that isotope. – qawert Oct 30 '17 at 15:02

3 Answers3

3

Use helper DataFrame, sort columns and then reindex by a.index:

c = df.columns
a = c[2:].to_series().str.extract('(\d+)([a-zA-Z]+)\s+(\d*)', expand=True)
#convert ints
a[0] = a[0].astype(int)
#convert to floats, non exis numbers generate NaNs
a[2] = pd.to_numeric(a[2], errors='coerce')
a = a.sort_values([0,1,2], na_position='first')
print (a)
             0   1    2
25Mg BLK    25  Mg  NaN
25Mg 1      25  Mg  1.0
25Mg 2      25  Mg  2.0
25Mg 3      25  Mg  3.0
25Mg 4      25  Mg  4.0
25Mg 5      25  Mg  5.0
44Ca BLK    44  Ca  NaN
44Ca 1      44  Ca  1.0
44Ca 2      44  Ca  2.0
44Ca 3      44  Ca  3.0
44Ca 4      44  Ca  4.0
44Ca 5      44  Ca  5.0
137Ba BLK  137  Ba  NaN
137Ba 1    137  Ba  1.0
137Ba 2    137  Ba  2.0
137Ba 3    137  Ba  3.0
137Ba 4    137  Ba  4.0
137Ba 5    137  Ba  5.0

df = df.reindex_axis(c[:2].tolist() + a.index.tolist(), axis=1)
print (df)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
1

See this answer here: https://stackoverflow.com/a/33555435/8239103 It seems to do what you want. For clarity I'll post the code here.

sequence = [Your sequence as a list as above]
your_dataframe = your_dataframe.reindex(columns=sequence)
  • Thanks for your response. I would like to have a program which sorts the column without any input, as the files I'm working with may have different numbers of elements. – qawert Oct 30 '17 at 15:03
1
from natsort import natsorted, ns

l1=list(map(lambda x: x.replace('BLK', '0000000'), l1))
l1=natsorted(l1)
l1=list(map(lambda x: x.replace('0000000', 'BLK'), l1))

l1
Out[1125]: 
['25Mg BLK',
 '25Mg 1',
 '25Mg 2',
 '25Mg 3',
 '25Mg 4',
 '25Mg 5',
 '44Ca BLK',
 '44Ca 1',
 '44Ca 2',
 '44Ca 3',
 '44Ca 4',
 '44Ca 5',
 '137Ba BLK',
 '137Ba 1',
 '137Ba 2',
 '137Ba 3',
 '137Ba 4',
 '137Ba 5']

Then doing df.reindex(l1)

BENY
  • 317,841
  • 20
  • 164
  • 234