Python sorting numbers in a multicolumn file

Question

I have a file with 4 column data, and I want to prepare a final output file which is sorted by the first column. The data file (rough.dat) looks like:

1    2    4    9
11    2    3    5
6    5    7    4
100    6    1    2

The code I am using to sort by the first column is:

with open('rough.dat','r') as f:
    lines=[line.split() for line in f]

a=sorted(lines, key=lambda x:x[0])
print a

The result I am getting is strange, and I think I'm doing something silly!

[['1', '2', '4', '9'], ['100', '6', '1', '2'], ['11', '2', '3', '5'], ['6', '5', '7', '4']]

You may see that the first column sorting is not done as per ascending order, instead, the numbers starting with 'one' takes the priority!! A zero after 'one' i.e 100 takes priority over 11!

score 0 · Answer 1 · answered Jul 03 '16 at 11:31

0

Strings are compared lexicographically (dictionary order):

>>> '100' < '6'
True
>>> int('100') < int('6')
False

Converting the first item to int in key function will give you what you want.

a = sorted(lines, key=lambda x: int(x[0]))

answered Jul 03 '16 at 11:31

falsetru

357,413
63
732
636

score 0 · Answer 2 · answered Jul 03 '16 at 11:34

You are sorting your numbers literally because they are strings not integers. As a more numpythonic way you can use np.loadtext in order to load your data then sort your rows based on second axis:

import numpy as np

array = np.loadtxt('rough.dat')
array.sort(axis=1)

print array
[[   1.    2.    4.    9.]
 [   2.    3.    5.   11.]
 [   4.    5.    6.    7.]
 [   1.    2.    6.  100.]]

Python sorting numbers in a multicolumn file

2 Answers2