2

I have the following problem:

I need to load several data files. The files are named by my device like:

meas98.dat
meas99.dat
meas100.dat
meas101.dat

With other words, there are no leading zeros. Therefore, if I get the filenames via

os.listdir

they are ordered alphabetically, meaning "meas100.dat" will be the first one. This is obviously not what I want to achieve. The question is what is the most elegant way of doing this?

The (unelegant) way I came up with is:

  • load the filenames
  • extract the filenumber
  • order the filenumber (get the indices)
  • order the filenames with those indices

I am pretty sure python has something build in that can do this while loading the files...

Glostas
  • 1,090
  • 2
  • 11
  • 21
  • 5
    Possible duplicate of [Does Python have a built in function for string natural sort?](http://stackoverflow.com/questions/4836710/does-python-have-a-built-in-function-for-string-natural-sort) – Mureinik Jan 02 '17 at 10:29
  • Write some custom lambda function to sort function. sort by filename.split('.')[-1].replace('meas') – Murali Mopuru Jan 02 '17 at 10:38

3 Answers3

7
l = ['meas98.dat',
    'meas99.dat',
    'meas100.dat',
    'meas101.dat']
l.sort(key=lambda i: int(i.strip('meas.dat')))

There is a pythonic way to do this by using pathlib module:

this is the files in my ternimal:

~/so$ ls
meas100.dat  meas98.dat  meas99.dat

this is the files in python:

from pathlib import Path
p = Path('/home/li/so/')
list(p.iterdir())
[PosixPath('/home/li/so/meas99.dat'),
 PosixPath('/home/li/so/meas98.dat'),
 PosixPath('/home/li/so/meas100.dat')]

looks like the pathlib has do this sort for you, you can take a try.

宏杰李
  • 11,820
  • 2
  • 28
  • 35
  • @Glostas please accept an answer to close this question, this will save other people's time. – 宏杰李 Jan 06 '17 at 12:08
  • As of today [the documentation says](https://docs.python.org/3/library/pathlib.html#pathlib.Path.iterdir): "The children are yielded in arbitrary order" – Mikhail Gerasimov Dec 06 '20 at 14:15
3

Using slicing [4:-4] to get only numbers from filename - and sorted() will use them to sort filenames.

# random order
l = [
    'meas98.dat',
    'meas100.dat',
    'meas99.dat',
    'meas101.dat',
    'meas1.dat',
]

sorted(l, key=lambda x: int(x[4:-4]))

print(l)

result

['meas1.dat', 'meas98.dat', 'meas99.dat', 'meas100.dat', 'meas101.dat']
furas
  • 134,197
  • 12
  • 106
  • 148
2

Perhaps this will suit your problem:

import re

l = ['meas100.dat',
     'meas101.dat',
     'meas98.dat',
     'meas99.dat']


sorted(l, key=lambda x: int(re.match('\D*(\d+)', x).group(1)))

Output:

['meas98.dat', 'meas99.dat', 'meas100.dat', 'meas101.dat']
Gustavo Bezerra
  • 9,984
  • 4
  • 40
  • 48
  • 1
    I have no idea why you got downvoted. I selected it as accepted, since it is in my opinion the most elegant solution – Glostas Mar 27 '18 at 13:58