1

is there a way (like php's natsort) to sort a list composed of filenames which only differ in one number and are returned unsorted by glob, e.g.:

test1.dat
test7.dat
.
.
test10.dat
test3.dat

When I do a naive sort the result is

test1.dat
test10.dat
test2.dat
.
.
.

because 1 occurs before 2 :) I could construct something with for loops and a range (or a generator with range) but this feels somewhat unpythonic...

kratenko
  • 7,354
  • 4
  • 36
  • 61
BandGap
  • 1,745
  • 4
  • 19
  • 26
  • See also [this](http://stackoverflow.com/questions/10866277/python-sort-strings-started-with-digits) and [that](http://stackoverflow.com/questions/4836710/does-python-have-a-built-in-function-for-string-natural-sort) and 10+ [other links](http://stackoverflow.com/search?q=natsort+python). Why don't people search before posting? – georg Jun 08 '12 at 11:11
  • @thg435: I did. All the search results I got had nothing to do with the problem at hand. But in light of the obvious duplicates you're right, I should have searched more. Btw you could write a better semantic search if users like me annoy you too much :) – BandGap Jun 08 '12 at 11:47
  • Don't take it personally. Duplicate content - this is what is annoying. You know, when you search for a problem and open 10 Stackoverflow tabs only to find that 9 of them contain nearly identical questions with identical responses. That sucks. – georg Jun 08 '12 at 11:55

3 Answers3

4
sorted(glob.glob('*.dat'), key=lambda x: int(x.split('.')[0][4:]))

This will take the filename, strip the extension and take the integer value of the characters after the fourth position. Works for 'testXXX.dat' where XXX are integers of any length.

eumiro
  • 207,213
  • 34
  • 299
  • 261
  • Hi, I have a similar problem, Can someone point the bug in my code: My list contains 58 filenames: [ 'RGI_0.pdf', 'RGI_1.pdf', 'RGI_10.pdf', 'RGI_11.pdf', .. 'RGI_57.pdf',.. 'RGI_6.pdf'] My 1 liner seems to just sort default way:- sorted(pdf_files, key=lambda x: int(x[x.find('_')+1:x.find('.')])) – Zakir Dec 05 '18 at 03:22
  • Just realized sorted() keeps the original list immutable.. used .sort() and this works like a charm Thanks! – Zakir Dec 06 '18 at 00:29
0

A solution using re (untested):

prefix_number=re.compile(r'(.*)(\d+)\.dat$')
def sortkey(ss):
    match=prefix_number.match(ss)
    if(match):
       return (match.group(0),int(match.group(1))
    else:
       return (ss,)

sorted(glob.iglob('*.dat'), key=sortkey )

This solution works by splitting the filenames by prefix (e.g. "test") and integer (e.g. 1). It then sorts first by prefix and second by integer. Of course, the downside is that you need re and a slightly more complicated solution.

mgilson
  • 300,191
  • 65
  • 633
  • 696
  • True that it's more flexible but it requires re and two lines of code. – BandGap Jun 08 '12 at 10:43
  • @BandGap -- That's the price you pay to be flexible I guess ;-). It could be done in one line -- just do `re.search(regex,x).group(1)`, but in general, I'm not in favor of packing things all into one line just because it's possible. (I don't understand people's obsession with 1-line answers -- They often tend to be unclear imho). – mgilson Jun 08 '12 at 10:48
  • I'm not obsessed with one-liners. In this case I was sure there would be one and I wanted to know it :) – BandGap Jun 08 '12 at 10:50
  • @BandGap -- Yeah, for your simple case, I do agree that eumiro's solution is better (probably more efficient than mine too), but I thought I'd post the more general one just in case. (actually, I've made it even slightly more general at the cost of a few more lines of code ;) – mgilson Jun 08 '12 at 11:01
0

The answer by eumiro is good. I just wanted to add a more flexible approach:

def natural_sort(data):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [convert(c) for c in re.split('([0-9]+)', key)]
    return sorted(data, key=alphanum_key)
Matthias
  • 12,873
  • 6
  • 42
  • 48
  • 1
    Duplicate content is not good for the internet, you could just link [here](http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html) and [here](http://stackoverflow.com/questions/4836710/does-python-have-a-built-in-function-for-string-natural-sort) – georg Jun 08 '12 at 11:08
  • @thg435: Had I known that this was floating around on the net I would have set a link. The code was on my computer for ages. – Matthias Jun 08 '12 at 12:34