7

I have a list paths_list which contains the path of files(images) of a particular folder . Example:

['/home/username/images/s1/4.jpg', '/home/username/images/s1/7.jpg', 
'/home/username/images/s1/6.jpg', '/home/username/images/s1/3.jpg', 
'/home/username/images/s1/5.jpg', '/home/username/images/s1/10.jpg', 
'/home/username/images/s1/9.jpg', '/home/username/images/s1/1.jpg', 
'/home/username/images/s1/2.jpg', '/home/username/images/s1/12.jpg', 
'/home/username/images/s1/11.jpg', '/home/username/images/s1/8.jpg']

I want to sort then in the order: [/1.jpg ,2.jpg .....,/12.jpg] Neither sorting via length nor via alphabetical order is helping. What should be done here?

Alex Waygood
  • 6,304
  • 3
  • 24
  • 46
Kancha
  • 409
  • 1
  • 3
  • 11
  • 2
    so **how** do you want to sort? what is the rule? if there is a rule it can be done. – Ma0 Jun 23 '17 at 12:16

7 Answers7

19

You can use sorted with a lambda. For the sorting criteria, you can use os to first pull just the file name (using basename), then you can split off just the filename less the extension (using splitext).

Lastly convert to int so you sort numerically instead of lexicographically.

>>> import os
>>> l = ['/home/username/images/s1/4.jpg', '/home/username/images/s1/7.jpg', '/home/username/images/s1/6.jpg', '/home/username/images/s1/3.jpg', '/home/username/images/s1/5.jpg', '/home/username/images/s1/10.jpg', '/home/username/images/s1/9.jpg', '/home/username/images/s1/1.jpg', '/home/username/images/s1/2.jpg', '/home/username/images/s1/12.jpg', '/home/username/images/s1/11.jpg', '/home/username/images/s1/8.jpg']
>>> sorted(l, key=lambda i: int(os.path.splitext(os.path.basename(i))[0]))
['/home/username/images/s1/1.jpg',
 '/home/username/images/s1/2.jpg',
 '/home/username/images/s1/3.jpg',
 '/home/username/images/s1/4.jpg',
 '/home/username/images/s1/5.jpg',
 '/home/username/images/s1/6.jpg',
 '/home/username/images/s1/7.jpg',
 '/home/username/images/s1/8.jpg',
 '/home/username/images/s1/9.jpg',
 '/home/username/images/s1/10.jpg',
 '/home/username/images/s1/11.jpg',
 '/home/username/images/s1/12.jpg']
Innat
  • 16,113
  • 6
  • 53
  • 101
Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
  • is there a reason for replacing the `split('.')` with `splitext`? Multiple dots in the basename? – Ma0 Jun 23 '17 at 12:20
  • 1
    I just figured since I was already using `os` for `basename` I'd use `splitext` to make it more clear what that split was doing. `split('.')` would have worked just fine too. Yeah that is also a good catch, `splitext` will work properly for multiple dots in the filename too, although not in this particular case, since they want the value to be numbers they can sort numerically. – Cory Kramer Jun 23 '17 at 12:22
  • 1
    I guess you should also add the point that unlike `.sort()` , `sorted()` doesn't sort the list in place so we need to save the list . for example : `l = sorted(l, key=lambda i: int(os.path.splitext(os.path.basename(i))[0]))` – Kancha Jun 23 '17 at 21:04
12

Use natural sorting (see this question): clean code and good practice when sorting strings.

from natsort import natsorted
l = ['/home/username/images/s1/4.jpg', '/home/username/images/s1/7.jpg', '/home/username/images/s1/6.jpg', '/home/username/images/s1/3.jpg', '/home/username/images/s1/5.jpg', '/home/username/images/s1/10.jpg', '/home/username/images/s1/9.jpg', '/home/username/images/s1/1.jpg', '/home/username/images/s1/2.jpg', '/home/username/images/s1/12.jpg', '/home/username/images/s1/11.jpg', '/home/username/images/s1/8.jpg']
natsorted(l)

gives

['/home/username/images/s1/1.jpg',
'/home/username/images/s1/2.jpg',
'/home/username/images/s1/3.jpg',
'/home/username/images/s1/4.jpg',
'/home/username/images/s1/5.jpg',
'/home/username/images/s1/6.jpg',
'/home/username/images/s1/7.jpg',
'/home/username/images/s1/8.jpg',
'/home/username/images/s1/9.jpg',
'/home/username/images/s1/10.jpg',
'/home/username/images/s1/11.jpg',
'/home/username/images/s1/12.jpg']

Natural sorting sorts based on how you would read things on a computer screen (alphabetically and numerically), rather than how the computer reads the code.

VinceP
  • 2,058
  • 2
  • 19
  • 29
3

Inspired by @Cory Kramer's answer, you can use the pathlib library and get a natural sort of the paths:

from pathlib import Path

a = ['/home/username/images/s1/4.jpg', 
     '/home/username/images/s1/7.jpg', 
     '/home/username/images/s1/6.jpg', 
     '/home/username/images/s1/3.jpg', 
     '/home/username/images/s1/5.jpg', 
     '/home/username/images/s1/10.jpg', 
     '/home/username/images/s1/9.jpg', 
     '/home/username/images/s1/1.jpg', 
     '/home/username/images/s1/2.jpg', 
     '/home/username/images/s1/12.jpg', 
     '/home/username/images/s1/11.jpg', 
     '/home/username/images/s1/8.jpg']

a = [Path(i) for i in a]
sorted_a = sorted(a, key=lambda i: int(i.stem))
sorted_a = [str(i) for i in a]

output:

['/home/username/images/s1/1.jpg',
 '/home/username/images/s1/2.jpg',
 '/home/username/images/s1/3.jpg',
 '/home/username/images/s1/4.jpg',
 '/home/username/images/s1/5.jpg',
 '/home/username/images/s1/6.jpg',
 '/home/username/images/s1/7.jpg',
 '/home/username/images/s1/8.jpg',
 '/home/username/images/s1/9.jpg',
 '/home/username/images/s1/10.jpg',
 '/home/username/images/s1/11.jpg',
 '/home/username/images/s1/12.jpg']

In general, using pathlib can sometimes give cleaner code expressions than plane os.path.

Shir
  • 1,571
  • 2
  • 9
  • 27
1

You can use split on "/", take the last element, split on ".", take the first, and convert it too an int:

l = ['/home/username/images/s1/4.jpg', '/home/username/images/s1/7.jpg', '/home/username/images/s1/6.jpg', '/home/username/images/s1/3.jpg', '/home/username/images/s1/5.jpg', '/home/username/images/s1/10.jpg', '/home/username/images/s1/9.jpg', '/home/username/images/s1/1.jpg', '/home/username/images/s1/2.jpg', '/home/username/images/s1/12.jpg', '/home/username/images/s1/11.jpg', '/home/username/images/s1/8.jpg']
sorted_list = sorted(l, key = lambda x: int(x.split("/")[-1].split(".")[0]))

output

['/home/username/images/s1/1.jpg',
 '/home/username/images/s1/2.jpg',
 '/home/username/images/s1/3.jpg',
 '/home/username/images/s1/4.jpg',
 '/home/username/images/s1/5.jpg',
 '/home/username/images/s1/6.jpg',
 '/home/username/images/s1/7.jpg',
 '/home/username/images/s1/8.jpg',
 '/home/username/images/s1/9.jpg',
 '/home/username/images/s1/10.jpg',
 '/home/username/images/s1/11.jpg',
 '/home/username/images/s1/12.jpg']
Tbaki
  • 1,013
  • 7
  • 12
  • 1
    I guess you should also add the point that unlike `.sort()` , `sorted()` doesn't sort the list in place so we need to save the list . for example : `l = sorted(l, key=lambda i: int(os.path.splitext(os.path.basename(i))[0]))` – Kancha Jun 23 '17 at 21:05
  • 1
    @Varun Thanks, didn't think about that. :) – Tbaki Jun 26 '17 at 07:39
1

The other answers here are good. But anyhow I would like to post mine with some explanations

from os.path import basename,splitext
path_list = ['/home/username/images/s1/4.jpg', '/home/username/images/s1/7.jpg',
             '/home/username/images/s1/6.jpg', '/home/username/images/s1/3.jpg',
             '/home/username/images/s1/5.jpg', '/home/username/images/s1/10.jpg',
             '/home/username/images/s1/9.jpg', '/home/username/images/s1/1.jpg',
             '/home/username/images/s1/2.jpg', '/home/username/images/s1/12.jpg',
             '/home/username/images/s1/11.jpg', '/home/username/images/s1/8.jpg']

new_list = [splitext(basename(x))[0] for x in path_list]

fin_list = list(zip(path_list,new_list))

fin_list = [x[0] for x in sorted(fin_list,key=lambda x: int(x[1]))]

print(fin_list)

1) Creates a list which has only the file name. 1,2,.. and so on.

new_list = [splitext(basename(x))[0] for x in path_list]

Note: Why [0] ?? Because the output of each splitext(basename(x))[0] would be like this,

('1','.jpg') , ('4','.jpg')

so [0] 0th index gives us just the filename!

2) zip each and every item from both iterables with each other and create a list. So this list has values like these,

fin_list = list(zip(path_list,new_list))
#output
('/home/username/images/s1/4.jpg','4.jpg')

3) [x[0] for x in sorted(fin_list,key=lambda x: int(x[1]))]

This one creates a list from the sorted list of fin_list note key is the main thing here. Key will be the second item from tuple i.e 4,3,7,.. and such. Based on which sorting happens.

finally your output:

['/home/username/images/s1/1.jpg', '/home/username/images/s1/2.jpg',
 '/home/username/images/s1/3.jpg', '/home/username/images/s1/4.jpg',
 '/home/username/images/s1/5.jpg', '/home/username/images/s1/6.jpg', 
'/home/username/images/s1/7.jpg', '/home/username/images/s1/8.jpg',
 '/home/username/images/s1/9.jpg', '/home/username/images/s1/10.jpg',
 '/home/username/images/s1/11.jpg', '/home/username/images/s1/12.jpg']
void
  • 2,571
  • 2
  • 20
  • 35
0

I find this neat

from pathlib import Path  # pathlib comes with python
sorted_files = sorted(files, key=lambda image_path: Path(image_path).name)
Eric O.
  • 474
  • 4
  • 23
0

To piggyback off of Shir's answer, if your file names are version numbers such as 1.0.ext, 2.3.4.ext, 3.0.ext, you can use:

import re
from pathlib import Path

files = Path('/your/path/here').glob('*.ext')

files = [
    f for f in files
    if re.match("[0-9]+\.[0-9]+\.?[0-9]*", f.stem)
]

files = sorted(
    files,
    key=lambda s: [int(u) for u in s.stem.split('.')]
)
Tyler
  • 161
  • 1
  • 11