13

For example I have list

my_list= ['image101.jpg', 'image2.jpg', 'image1.jpg']

and

my_list.sort()

gives me

['image1.jpg', 'image101.jpg', 'image2.jpg']

but I of course need

['image1.jpg', 'image2.jpg', 'image101.jpg']

How it can be done?

mrgloom
  • 20,061
  • 36
  • 171
  • 301

4 Answers4

20

list.sort accepts optional key function. Each item is passed to the function, and the return value of the function is used to compare items instead of the original values.

>>> my_list= ['image101.jpg', 'image2.jpg', 'image1.jpg']
>>> my_list.sort(key=lambda x: int(''.join(filter(str.isdigit, x))))
>>> my_list
['image1.jpg', 'image2.jpg', 'image101.jpg']

filter, str.isdigit were used to extract numbers:

>>> ''.join(filter(str.isdigit, 'image101.jpg'))
'101'
>>> int(''.join(filter(str.isdigit, 'image101.jpg')))
101
  • ''.join(..) is not required in Python 2.x
falsetru
  • 357,413
  • 63
  • 732
  • 636
  • 2
    Just wanted to mention this, it won't work if the file name is something like `image21_20160328.jpg`. The number it will extract is `2120160328`. – JRodDynamite Mar 28 '16 at 09:58
  • @JasonEstibeiro, You're right. In such case, need to capture all digits and convert them, using something like `lits(map(int, re.findall(r'\d+', x)))` – falsetru Mar 28 '16 at 10:01
9

Use a regex to pull the number from the string and cast to int:

import  re
r = re.compile("\d+")
l = my_list= ['image101.jpg', 'image2.jpg', 'image1.jpg']
l.sort(key=lambda x: int(r.search(x).group()))

Or maybe use a more specific regex including the .:

import  re

r = re.compile("(\d+)\.")
l = my_list= ['image101.jpg', 'image2.jpg', 'image1.jpg']
l.sort(key=lambda x: int(r.search(x).group()))

Both give the same output for you example input:

['image1.jpg', 'image2.jpg', 'image101.jpg']

If you are sure of the extension you can use a very specific regex:

 r = re.compile("(\d+)\.jpg$")
 l.sort(key=lambda x: int(r.search(x).group(1)))
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
7

If you want to do this in the general case, I would try a natural sorting package like natsort.

from natsort import natsorted
my_list = ['image101.jpg', 'image2.jpg', 'image1.jpg']
natsorted(my_list)

Returns:

['image1.jpg', 'image2.jpg', 'image101.jpg']

You can install it using pip i.e. pip install natsort

reupen
  • 571
  • 3
  • 8
4

Actually you don't need any regex patern. You can parse easily like that.

>>> 'image101.jpg'[5:-4]
'101'

Solution:

>>> sorted(my_list, key=lambda x: int(x[5:-4]))
['image1.jpg', 'image2.jpg', 'image101.jpg']
Adem Öztaş
  • 20,457
  • 4
  • 34
  • 42