44

I'm processing some files in a directory and need the files to be sorted numerically. I found some examples on sorting—specifically with using the lambda pattern—at wiki.python.org, and I put this together:

import re

file_names = """ayurveda_1.tif
ayurveda_11.tif
ayurveda_13.tif
ayurveda_2.tif
ayurveda_20.tif
ayurveda_22.tif""".split('\n')

num_re = re.compile('_(\d{1,2})\.')

file_names.sort(
    key=lambda fname: int(num_re.search(fname).group(1))
)

Is there a better way to do this?

Zach Young
  • 10,137
  • 4
  • 32
  • 53
  • 5
    +1 for a proper question title. – Geoffrey Jan 07 '11 at 07:37
  • 4
    The _right_ way to do what you're doing is to just ask the question in the question bit, then add your answer in an answer bit. Then sit back and wait ... – paxdiablo Jan 07 '11 at 07:38
  • @paxdiablo: Thank you for the instruction... I had read the FAQ to make sure I could answer, just wasn't quite sure about the mechanics. I'll do it right next time. – Zach Young Jan 07 '11 at 07:42
  • No probs, Zachary, it's just that "How do I xyzzy?" is a must more useful question (as in more likely to elicit a wide range of possible answers) than "I have xyzzyed. What do you think of my method?" :-) – paxdiablo Jan 07 '11 at 07:44

6 Answers6

69

This is called "natural sorting" or "human sorting" (as opposed to lexicographical sorting, which is the default). Ned B wrote up a quick version of one.

import re

def tryint(s):
    try:
        return int(s)
    except:
        return s

def alphanum_key(s):
    """ Turn a string into a list of string and number chunks.
        "z23a" -> ["z", 23, "a"]
    """
    return [ tryint(c) for c in re.split('([0-9]+)', s) ]

def sort_nicely(l):
    """ Sort the given list in the way that humans expect.
    """
    l.sort(key=alphanum_key)

It's similar to what you're doing, but perhaps a bit more generalized.

Daniel DiPaolo
  • 55,313
  • 14
  • 116
  • 115
  • 2
    Thank you, Daniel! This was just what I was looking for. I followed the link you included and down the rabbit hole I went... weeee!!! I learned a little bit about the performance of try/except, and (of course) pre-compiling regexps. :) – Zach Young Jan 07 '11 at 08:00
  • Will this work if we return a generator rather than a list comprehension? – Karl Knechtel Jan 07 '11 at 08:10
  • Doesn't handle negative embedded numbers properly. – martineau Jan 07 '11 at 10:53
  • @martineau: I understand that since the regexp is splitting only at the digit, that any sign character would be in the group before the number. Since this is just an indexed list of files starting at 1, I don't think this will be an issue. – Zach Young Jan 07 '11 at 15:39
  • 2
    @Zachary Young: I suspected that handling negative numbers wasn't important to you, but made the comment only draw attention to the fact for others for whom it might be (after all, your question just says "numerically"). It's easy to fix, just use `re.split('(-*[0-9]+)', s)` instead...and even more generally, it can be made to handle [signed] real numbers, like `-3.14`, by using `re.split('(-*\d+\.\d*)' , s)`. Lastly, if you don't want to define a separate function like `sort_nicely()`, you can always use `tiffFiles.sort(key=alphanum_key)` as you did in the code in your question. – martineau Jan 07 '11 at 23:16
  • 1
    If using real numbers, one should also convert the number to float not int (i.e. make a tryfloat(s) function instead of tryint(s)) – MD004 Oct 06 '15 at 18:10
20

Just use :

tiffFiles.sort(key=lambda var:[int(x) if x.isdigit() else x for x in re.findall(r'[^0-9]|[0-9]+', var)])

is faster than use try/except.

dkmatt0
  • 233
  • 2
  • 8
5

If you are using key= in your sort method you shouldn't use cmp which has been removed from the latest versions of Python. key should be equated to a function which takes a record as input and returns any object which will compare in the order you want your list sorted. It doesn't need to be a lambda function and might be clearer as a stand alone function. Also regular expressions can be slow to evaluate.

You could try something like the following to isolate and return the integer part of the file name:

def getint(name):
    basename = name.partition('.')
    alpha, num = basename.split('_')
    return int(num)
tiffiles.sort(key=getint)
Don O'Donnell
  • 4,538
  • 3
  • 26
  • 27
  • Thank you, Don. I really appreciate your explanation: very understandable. --Zachary – Zach Young Jan 07 '11 at 08:10
  • @Don O'Donnell I got error _AttributeError: 'tuple' object has no attribute 'split'_ so I modified a bit your code: `basename = name.partition('.')` I change with `basename = name.split('.')` (**Important! Works only for filenames without dots**) and `alpha, num = basename.split('_')` with `alpha, num = basename[0].split('_')` Anyway, you made my day. Thanks! – KnightWhoSayNi Nov 26 '14 at 20:43
5

@April provided a good solution in How is Pythons glob.glob ordered? that you could try

#First, get the files:
import glob
import re

files = glob.glob1(img_folder,'*'+output_image_format)

# Sort files according to the digits included in the filename
files = sorted(files, key=lambda x:float(re.findall("(\d+)",x)[0]))
yoonghm
  • 4,198
  • 1
  • 32
  • 48
0

Partition results in Tuple

def getint(name):
    (basename, part, ext) = name.partition('.')
    (alpha, num) = basename.split('_')
    return int(num)
Prabhath Kota
  • 93
  • 1
  • 7
  • Did you actually try that? `(a, b, c) = 'ayurveda_11.tif'.split('.'), ValueError: need more than 2 values to unpack` – Zach Young Jul 15 '15 at 15:51
0

This is a modified version of @Don O'Donnell's answer, because I couldn't get it working as-is, but I think it's the best answer here as it's well-explained.

def getint(name):
    _, num = name.split('_')
    num, _ = num.split('.')
    return int(num)

print(sorted(tiffFiles, key=getint))

Changes:

1) The alpha string doesn't get stored, as it's not needed (hence _, num)

2) Use num.split('.') to separate the number from .tiff

3) Use sorted instead of list.sort, per https://docs.python.org/2/howto/sorting.html

StatsSorceress
  • 3,019
  • 7
  • 41
  • 82