0

I have a script whose purpose is to sort and process spatial dataset files which are constantly being downloaded onto a server. My list currently looks roughly like this:

list = ['file.t00Z.wrff02.grib2', 'file.t00Z.wrff03.grib2', 'file.t00Z.wrff00.grib2', 'file.t00Z.wrff05.grib2', 'file.t00Z.wrff04.grib2', 'file.t00Z.wrff01.grib2', 'file.t06Z.wrff01.grib2', 'file.t06Z.wrff00.grib2', 'file.t06Z.wrff02.grib2', ...]

As you can see, each file has a specific naming convention.

Later in the script, the files in this list will be processed sequentially, but I need to process them in order of the time designated by the two digits following "wrff" in each filename (00, 01, 02...).

I currently have a regular expression which removes any files from the list which don't match the two digits following "file.t", as is necessary. But is there an easy method to sort list elements by substring?

Note: I would choose to simply sort these files by modification time, but they often appear in the data directory out of order.

nat5142
  • 485
  • 9
  • 21
  • 2
    This isn't a duplicate of the mentioned question since the desired order is not always the natural lexicographic order. – dugup Sep 11 '17 at 22:46
  • 3
    @cᴏʟᴅsᴘᴇᴇᴅ This is not a duplicate because `Sorting dictionary with alphanumeric keys in natural order [duplicate]` involves the items in a dictionary and `Does Python have a built in function for string natural sort?` does not answer the OP's question because it does not involve lexicographic order. Finally, both "duplicate" answers are beyond the scope of this post. – Ajax1234 Sep 11 '17 at 22:52
  • 1
    @dugup It is unreasonable to assume questions of this kind haven't been asked before. Rather than voting to reopen, please find the appropriate duplicate (if the original one was incorrectly marked). – cs95 Sep 11 '17 at 22:59
  • 1
    @Ajax1234 I think the new duplicate should address OP's question sufficiently. All they need to do is modify the lambda. – cs95 Sep 11 '17 at 22:59
  • @cᴏʟᴅsᴘᴇᴇᴅ I do not believe so. The sample input for the user in `Sort list of strings by a part of the string` is 'variable1 (name1)'. How is that the same as 'file.t00Z.wrff02.grib2'? Assuming the OP is aware of advanced lambda functions is also unreasonable. – Ajax1234 Sep 11 '17 at 23:04
  • 1
    @Ajax1234 I don't see how the answer in the duplicate is so much different than the one here. Just because the questions are different, does not mean they need not have the same/similar solution, and can be marked as such. – cs95 Sep 11 '17 at 23:06
  • @cᴏʟᴅsᴘᴇᴇᴅ I guess our definitions of duplicates are different :) – Ajax1234 Sep 11 '17 at 23:10
  • @Ajax1234 From the horse's mouth: https://stackoverflow.com/questions/41983180/is-the-empty-tuple-in-python-a-constant#comment71144721_41983180 – cs95 Sep 11 '17 at 23:11

1 Answers1

4

You can use sorted or sort and supply a lambda function that extracts the numbers you want as the key.

sorted_list = sorted(list, key=lambda f: f[f.find('wrff'): f.find('wrff') + 6])
dugup
  • 418
  • 4
  • 7
  • Awesome. I'll give this a shot in a little bit! – nat5142 Sep 11 '17 at 22:53
  • Worked beautifully when I used this: `list.sort(key=lambda x: x[x.find('wrff'): x.find('.grib2')])` – nat5142 Sep 12 '17 at 19:36
  • Do you know how I'd use this format to match characters which appear at the end of the string? – nat5142 Sep 13 '17 at 02:18
  • 1
    If you want to use more complicated logic you can create a regular function instead of a lambda function and pass that as the key. Creating a function that uses regular expressions to extract the part of the string you are interested in will probably be your best bet – dugup Sep 13 '17 at 09:54
  • I played around with it last night and I found that by doing this: `list.sort(key=lambda x: x[x.find('wrff'):])` will sort by characters following the string in parentheses. Thanks for your help!! – nat5142 Sep 13 '17 at 22:41