0

I have a list

a = ['d2da1.png','d2da11.png','d2da2.png','d2da111.png','d2da22.png']

that I want to automatically sort based upon the last number before the period, . in each element. This number in the string may be any integer of any length (e.g. 1,11,2,111,22). The output sought is

a = ['d2da1.png','d2da2.png','d2da11.png','d2da22.png','d2da111.png']

Conventional methods such as sorted fail here, and brute forcing may be time-intensive if considering very large lists. Therefore, is there a way to sort the strings above based upon the last full number in the string to output the sought list?

Mathews24
  • 681
  • 10
  • 30
  • 2
    Is the "prefix" before the number to be sorted the same in all file name? It appears to be in your example or are the prefixes diverse? – AirSquid May 11 '20 at 19:57
  • There is a module, [natsort](https://stackoverflow.com/questions/4836710/is-there-a-built-in-function-for-string-natural-sort/18415320#18415320) that sorts in the order you need. – Chris Charley May 11 '20 at 20:39
  • @JeffH The prefix is arbitrary, but for a particular list, it would be constant (e.g. fixed to `'d2da'` for the above list). – Mathews24 May 11 '20 at 21:53
  • OK. Well, my answer stands below. :). You have several options here. Each might have problems if the prefix portion has additional digits in it, or if some do/some don't – AirSquid May 11 '20 at 22:03

3 Answers3

2

If the prefix/suffix remains the same, this works and I think is pretty clear:

In [10]: a                                                                      
Out[10]: ['d2da1.png', 'd2da11.png', 'd2da2.png', 'd2da111.png', 'd2da22.png']

In [11]: def chop(s): 
    ...:     return int(s.replace('d2da','').replace('.png','')) 
    ...:                                                                        

In [12]:                                                                        

In [12]: [chop(t) for t in a]    # a little side-test on chop()                                                  
Out[12]: [1, 11, 2, 111, 22]

In [13]: a.sort(key=lambda x: chop(x))                                          

In [14]: a                                                                      
Out[14]: ['d2da1.png', 'd2da2.png', 'd2da11.png', 'd2da22.png', 'd2da111.png']
AirSquid
  • 10,214
  • 2
  • 7
  • 31
1

you can use a regular expression:

import re

a = ['d2da1.png','d2da11.png','d2da2.png','d2da111.png','d2da22.png']

a.sort(key=lambda x : int(re.search(r'(\d+)\.', x).group(1)))
print(a)
# ['d2da1.png', 'd2da2.png', 'd2da11.png', 'd2da22.png', 'd2da111.png']
kederrac
  • 16,819
  • 6
  • 32
  • 55
0

You can try an implementation of itertools.takewhile combo with sorted:

import itertools as it
sorted(a, key=lambda x: int(''.join(it.takewhile(str.isdigit, x[-5:0:-1]))[::-1]))

Result:

['d2da1.png', 'd2da2.png', 'd2da11.png', 'd2da22.png', 'd2da111.png']

takewhile will grab the digits from . in reverse.
''.join to form the string and [::-1] to reverse it back, and convert to int.
Use that as the sort key and it'll produce the desired result.

r.ook
  • 13,466
  • 2
  • 22
  • 39