sort files in a specific order

Question

I want to know how I can sort the filenames as they are in the directory. For example, I have the following names:

1_00000_6.54.csv
2_00000_1.70.csv
3_00000_1.70.csv
...
10_00000_1.70.csv
11_00000_1.70.csv
...

With the following python code I get the following order:

 def get_pixelist(path):
     return [os.path.join(path,f) for f in os.listdir(path) if f.endswith('.csv')]

 def group_uniqmz_intensities(path):
     pxlist = sorted(get_pixelist(path))

gives:

1_00000_6.54.csv
10_00000_1.70.csv
11_00000_1.70
...
2_00000_1.70.csv
...
3_00000_1.70.csv
...

I want the order shown before.

Good question. What you're asking is sometimes referred to as the ["natural sort order"](http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html); it would make sense to make a Python `key` for that. — Kos, Jan 30 '13 at 10:33

Martijn Pieters · Answer 1 · 2013-01-30T12:09:45.940

2

The easiest would be to zero-pad the filenames when sorting:

def group_uniqmz_intensities(path):
    pxlist = sorted(get_pixelist(path), key=lambda f: f.rjust(17, '0'))

which will pad each filename to 17 characters with 0 characters when sorting; so 1_00000_6.54.csv is padded to 01_00000_6.54.csv while 10_00000_1.70.csv is left as is. Lexographically, 01 sorts before 10.

I picked 17 as a hardcoded value to simplify things; you could find the required value automatically by using this instead:

def group_uniqmz_intensities(path):
    padsize = max(len(f) for f in pxlist)
    pxlist = sorted(get_pixelist(path), key=lambda f: f.rjust(padsize, '0'))

edited Jan 30 '13 at 12:09

answered Jan 30 '13 at 10:28

Martijn Pieters

1,048,767
296
4,058
3,343

Nice one! To make sure it works properly 17 should be changed to length of the longest filename. Something like max_length = len(max(get_pixelist(path), key=lambda x: len(x)) . – Dimitri Vorona Jan 30 '13 at 10:37
I'd use `str.zfill` instead of `str.rjust` – Bakuriu Jan 30 '13 at 10:38
thanks Martijn. This is not what I want. I want the following order: 1_00000_6.54.csv 2_00000_1.70.csv 3_00000_1.70.csv ... 10_00000_1.70.csv 11_00000_1.70.csv – Hocine Ben Jan 30 '13 at 10:38
@HocineBen That's exactly what you obtain with this solution. – Bakuriu Jan 30 '13 at 10:40
@HocineBen see my comment above. Try changing 17 to 25 (or some big number) and see if it helps. – Dimitri Vorona Jan 30 '13 at 10:42
I get this (with the solution): 10_00000_6.54.csv 11_00000_1.70.csv 1_00000_1.70 ... 2_00000_1.70.csv ... 3_00000_1.70.csv – Hocine Ben Jan 30 '13 at 10:46
@Bakuriu: You could do that too; we don't need to handle signs (`-` or `+` here) so using `.rjust` is probably slightly faster. – Martijn Pieters Jan 30 '13 at 12:08
@HocineBen: it could be that I miscounted; use a higher number than 17 or determine the length automatically with using the `max()` line I added. – Martijn Pieters Jan 30 '13 at 12:09
@MartijnPieters Testing a bit with `timeit` it seems `zfill` is slightly *faster* than `rjust`. At least in python2.7.3 on linux. – Bakuriu Jan 30 '13 at 13:04

score 0 · Answer 2 · answered Jan 30 '13 at 10:32

Since '1' < '_' you get the second ordering. You can achieve your goal by giving a key-function to sorted:

 def group_uniqmz_intensities(path):
     pxlist = sorted(get_pixelist(path), key=lambda x: int(x.split("_")[0]))

Please make sure ALL of your files are following the same naming scheme ({number}_{rest}.csv) otherwise there will be a ValueError.

EDIT: Martijn Pieters provides a more elegant solution.

score 0 · Answer 3 · edited May 23 '17 at 10:24

0

Based on this answer for alphanumerical sorting:

def group_uniqmz_intensities(path):
    pxlist = sorted(get_pixelist(path), key=lambda filename: int(filename.partition('_')[0]))

edited May 23 '17 at 10:24

Community

1
1

answered Jan 30 '13 at 10:37

BioGeek

21,897
23
83
145

Kos · Answer 4 · 2013-01-30T10:53:09.783

Here's a trivial implementation of natural ordering, assuming that your fields are all split by _:

def int_if_possible(s):
    try:
        return int(s)
    except:
        return s


>>> sorted(s, key=lambda s: map(int_if_possible, s.split('_')))
['1_00000_6.54.csv',
 '2_00000_1.70.csv',
 '3_00000_1.70.csv',
 '10_00000_1.70.csv',
 '11_00000_1.70.csv']

This implementation leverages the fact that lists get compared element-by-element. If the elements are convertible to ints, we compare them as ints, otherwise we fall back to string comparison.

Edit: A more elaborate solution for natural sorting is presented here: Natural string sorting.

It's pretty clever: it uses a regex \d+\D+ to split input strings into alternating numbers and non-numbers. Then numbers are compared numerically, and non-numbers alphabetically.

sort files in a specific order

4 Answers4