2

So I have a list of strings that looks roughly like this:

list = ['file.t00Z.wrff02.grib2', 'file.t00Z.wrff03.grib2', 'file.t00Z.wrff00.grib2',
        'file.t00Z.wrff05.grib2', 'file.t00Z.wrff04.grib2', 'file.t00Z.wrff01.grib2', 
        'file.t06Z.wrff01.grib2', 'file.t06Z.wrff00.grib2', 'file.t06Z.wrff02.grib2', ...]

I recently asked a question here wherein I learned how to sort my list of strings by substring using a lambda function:

list.sort(key=lambda x: x[x.find('wrff'):])

But now I need to know if there's a way to sort by two different substrings, almost like a composite primary key in a database. I'd like to sort the files first by the two digits following "file.t", and then by the two digits following "wrff". Is there a way that both of these actions can be performed at once?

SOLUTION: I wound up using the two-tuple lambda function sort that user Moses Koledoye recommended below, but I ran into problems when trying to apply this sorting process to groups of filenames with different naming conventions.

In my script I have 3 Python objects which grab files from unique data directories and form a list (like the one above) containing the files. Each of the objects grab files with different naming conventions, and each different group of files has a varying number of digit groups within their names.

To handle this without adding complexity, I decided to use the natsort module that user Jared Gougen suggested, and it worked very nicely.

nat5142
  • 485
  • 9
  • 21

2 Answers2

5

You can use re.findall to pick those first two digits and then use them for sorting in a 2-tuple:

import re

lst = sorted(lst, key=lambda x: tuple(int(i) for i in re.findall('\d+', x)[:2]))
print(lst)
# ['file.t00Z.wrff00.grib2', 'file.t00Z.wrff01.grib2', 'file.t00Z.wrff02.grib2', 
#  'file.t00Z.wrff03.grib2', 'file.t00Z.wrff04.grib2', 'file.t00Z.wrff05.grib2', 
#  'file.t06Z.wrff00.grib2', 'file.t06Z.wrff01.grib2', 'file.t06Z.wrff02.grib2', ...]

This takes the first digit after file.t and then that after wrff.

Moses Koledoye
  • 77,341
  • 8
  • 133
  • 139
4

It seems like this is approaching the area where regular expressions are useful. Here's one solution which captures the two subsequences of digits that you require.

import re

get_indices = lambda s: re.match('^.*?file\.t([0-9]{2}).*?wrff([0-9]{2}).*$', s).groups()
sorted(file_names, key=get_indices)

Or, in situations like these, I'm often trying to naturally sort file names. In those cases, I have the following set of functions in a library file.

import re

def tryint(s):
    try:
        return int(s)
    except:
        return s

def getchunks(string):
    return [tryint(c) for c in re.split('([0-9]+)', string)]

def sort_naturally(l):
    return sorted(l, key=getchunks)

The library natsort was written to naturally sort on a more comprehensive level if you're looking for something more heavy duty.

Jared Goguen
  • 8,772
  • 2
  • 18
  • 36
  • Wow the natsort package is a really neat suggestion. Thanks! – nat5142 Sep 15 '17 at 15:27
  • The re.split feature is really handy. Using the tryint(s) defined here, and sorting a `glob.glob("/usr/share/icons/"+theme+"/*/"+category+"/"+name+".*")` I have my key function `return [tryint(c) for c in re.split('([0-9]+)',x.split("/")[5])]` which works. – bgStack15 Jun 14 '19 at 02:32