Sort list of strings by two substrings using lambda function

Question

So I have a list of strings that looks roughly like this:

list = ['file.t00Z.wrff02.grib2', 'file.t00Z.wrff03.grib2', 'file.t00Z.wrff00.grib2',
        'file.t00Z.wrff05.grib2', 'file.t00Z.wrff04.grib2', 'file.t00Z.wrff01.grib2', 
        'file.t06Z.wrff01.grib2', 'file.t06Z.wrff00.grib2', 'file.t06Z.wrff02.grib2', ...]

I recently asked a question here wherein I learned how to sort my list of strings by substring using a lambda function:

list.sort(key=lambda x: x[x.find('wrff'):])

But now I need to know if there's a way to sort by two different substrings, almost like a composite primary key in a database. I'd like to sort the files first by the two digits following "file.t", and then by the two digits following "wrff". Is there a way that both of these actions can be performed at once?

SOLUTION: I wound up using the two-tuple lambda function sort that user Moses Koledoye recommended below, but I ran into problems when trying to apply this sorting process to groups of filenames with different naming conventions.

In my script I have 3 Python objects which grab files from unique data directories and form a list (like the one above) containing the files. Each of the objects grab files with different naming conventions, and each different group of files has a varying number of digit groups within their names.

To handle this without adding complexity, I decided to use the natsort module that user Jared Gougen suggested, and it worked very nicely.

Moses Koledoye · Accepted Answer · 2017-09-14T21:04:38.647

5

You can use re.findall to pick those first two digits and then use them for sorting in a 2-tuple:

import re

lst = sorted(lst, key=lambda x: tuple(int(i) for i in re.findall('\d+', x)[:2]))
print(lst)
# ['file.t00Z.wrff00.grib2', 'file.t00Z.wrff01.grib2', 'file.t00Z.wrff02.grib2', 
#  'file.t00Z.wrff03.grib2', 'file.t00Z.wrff04.grib2', 'file.t00Z.wrff05.grib2', 
#  'file.t06Z.wrff00.grib2', 'file.t06Z.wrff01.grib2', 'file.t06Z.wrff02.grib2', ...]

This takes the first digit after file.t and then that after wrff.

edited Sep 14 '17 at 21:04

answered Sep 14 '17 at 21:00

Moses Koledoye

77,341
8
133
139

@ChristianDean The answer addresses that. See `[:2]`. – Moses Koledoye Sep 14 '17 at 21:03
1

Ah, I see what you mean. Nice answer. +1 Sorry about that. Like I said, my brain was fried. – Christian Dean Sep 14 '17 at 21:04
3

@ChristianDean does have a point though, this will capture stray 1-digit sequences and sequences not after the requested substrings (which may or may not be an issue). – Jared Goguen Sep 14 '17 at 21:11

Jared Goguen · Answer 2 · 2017-09-14T21:09:46.700

4

It seems like this is approaching the area where regular expressions are useful. Here's one solution which captures the two subsequences of digits that you require.

import re

get_indices = lambda s: re.match('^.*?file\.t([0-9]{2}).*?wrff([0-9]{2}).*$', s).groups()
sorted(file_names, key=get_indices)

Or, in situations like these, I'm often trying to naturally sort file names. In those cases, I have the following set of functions in a library file.

import re

def tryint(s):
    try:
        return int(s)
    except:
        return s

def getchunks(string):
    return [tryint(c) for c in re.split('([0-9]+)', string)]

def sort_naturally(l):
    return sorted(l, key=getchunks)

The library natsort was written to naturally sort on a more comprehensive level if you're looking for something more heavy duty.

edited Sep 14 '17 at 21:09

answered Sep 14 '17 at 21:04

Jared Goguen

8,772
2
18
36

Wow the natsort package is a really neat suggestion. Thanks! – nat5142 Sep 15 '17 at 15:27
The re.split feature is really handy. Using the tryint(s) defined here, and sorting a `glob.glob("/usr/share/icons/"+theme+"/*/"+category+"/"+name+".*")` I have my key function `return [tryint(c) for c in re.split('([0-9]+)',x.split("/")[5])]` which works. – bgStack15 Jun 14 '19 at 02:32

Sort list of strings by two substrings using lambda function

2 Answers2