13

I am trying to create a sorted list of files in the ./pages directory. This is what I have so far:

import numpy as np
from PIL import Image
import glob
from pathlib import Path


# sorted( l, key=lambda a: int(a.split("-")[1]) )
image_list = []

for filename in Path('./pages').glob('*.jpg'):
#     sorted( i, key=lambda a: int(a.split("_")[1]) )
#     im=Image.open(filename)
    image_list.append(filename)

print(*image_list, sep = "\n")

current output:

pages/page_1.jpg  
pages/page_10.jpg  
pages/page_11.jpg  
pages/page_12.jpg  
pages/page_2.jpg  
pages/page_3.jpg  
pages/page_4.jpg  
pages/page_5.jpg  
pages/page_6.jpg  
pages/page_7.jpg  
pages/page_8.jpg  
pages/page_9.jpg  

Expected Output:

pages/page_1.jpg   
pages/page_2.jpg  
pages/page_3.jpg  
pages/page_4.jpg  
pages/page_5.jpg  
pages/page_6.jpg  
pages/page_7.jpg  
pages/page_8.jpg  
pages/page_9.jpg  
pages/page_10.jpg  
pages/page_11.jpg  
pages/page_12.jpg

I've tried the solutions found in the duplicate, but they don't work because the pathlib files are class objects, and not strings. They only appear as filenames when I print them.

For example:

print(filename) # pages/page_1.jpg  
print(type(filename)) # <class 'pathlib.PosixPath'>

Finally, this is working code. Thanks to all.

from pathlib import Path
import numpy as np
from PIL import Image
import natsort

def merge_to_single_image():
    image_list1 = []
    image_list2 = []
    image_list3 = []
    image_list4 = []

    for filename in Path('./pages').glob('*.jpg'):
        image_list1.append(filename)

    for i in image_list1:
        image_list2.append(i.stem)
    #     print(type(i.stem))

    image_list3 = natsort.natsorted(image_list2, reverse=False)

    for i in image_list3:
        i = str(i)+ ".jpg"
        image_list4.append(Path('./pages', i))

    images = [Image.open(i) for i in image_list4]
    # for a vertical stacking it is simple: use vstack
    images_combined = np.vstack(images)
    images_combined = Image.fromarray(images_combined)
    images_combined.save('Single_image.jpg')
PrasadHeeramani
  • 251
  • 1
  • 2
  • 10
  • Does all files have the same ```page_``` prefix? – accdias Oct 09 '19 at 21:01
  • Filename is generated by me, so it is not compulsory to have `page_`. It may be also as 1.jpg, 2.jpg, 3.jpg, ... , 10.jpg, 11.jpg – PrasadHeeramani Oct 10 '19 at 00:23
  • You just need to turn the path object into a string first. Try this: `for filename in sorted(Path('./pages').glob('*.jpg'), key=lambda a: int(str(a).split("_")[1])):` – Lord Elrond Oct 10 '19 at 04:57
  • **Solution:** since filenames are created by you: Write file names adding padded zeros, like [bellow](https://stackoverflow.com/a/73533144/1207193). Sort will be easy. – imbr Aug 29 '22 at 18:55

5 Answers5

5

One can use natsort lib (pip install natsort. It should look simple too.
[! This works, at least tested for versions 5.5 and 7.1 (current)]

from natsort import natsorted

image_list = Path('./pages').glob('*.jpg')
image_list = natsorted(image_list, key=str)

# Or convert list of paths to list of string and (naturally)sort it, then convert back to list of paths
image_list = [Path(p) for p in natsorted([str(p) for p in image_list ])]
Jaja
  • 662
  • 7
  • 15
  • If paths have different parents, look in [docs](https://natsort.readthedocs.io/en/master/howitworks.html?highlight=pathlib#sorting-filesystem-paths) – Jaja Jun 01 '21 at 08:10
5

Just for posterity, maybe this is more succinct?

natsorted(list_of_pathlib_objects, key=str)
OlleNordesjo
  • 133
  • 1
  • 7
3

Note that sorted doesn't sort your data in place, but returns a new list, so you have to iterate on its output.

In order to get your sorting key, which is the integer value at the end of your filename:

  • You can first take the stem of your path, which is its final component without extension (so, for example, 'page_13').

  • Then, it is better to split it once from the right, in order to be safe in case your filename contains other underscores in the first part, like 'some_page_33.jpg'.

  • Once converted to int, you have the key you want for sorting.

So, your code could look like:

for filename in sorted(Path('./pages').glob('*.jpg'), 
                       key=lambda path: int(path.stem.rsplit("_", 1)[1])):

    print(filename)

Sample output:

pages/ma_page_2.jpg
pages/ma_page_11.jpg
pages/ma_page_13.jpg
pages/ma_page_20.jpg
Thierry Lathuille
  • 23,663
  • 10
  • 44
  • 50
3

Just use like this...

from pathlib import Path

- sorted by name:

sorted(Path('anywhere/you/want').glob('*.jpg'))

- sorted by modification time:

import os
sorted(Path('anywhere/you/want').glob('*.jpg'), key=os.path.getmtime)

- sorted by size:

import os
sorted(Path('anywhere/you/want').glob('*.jpg'), key=os.path.getsize)

etc.

Hint: since filenames are also created by you. Write file names adding padded zeros, like:

for i in range(100):        
    with open('filename'+f'_{i:03d}','wb'):  # py3.6+ fstring        
       # write your file stuff...
    # py3.3+ 'filename'+'_{:03d}'.format(i) for str.format()
 ...
 'filename_007',
 'filename_008',
 'filename_009',
 'filename_010',
 'filename_011',
 'filename_012',
 'filename_013',
 'filename_014',
 ...
imbr
  • 6,226
  • 4
  • 53
  • 65
1

The problem is not as easy as it sounds, "natural" sorting can be quite challenging, especially with potential arbitrary input strings, e.g what if you have "69_helloKitty.jpg" in your data? I used https://github.com/SethMMorton/natsort a while ago for a similar problem, maybe it helps you.

Christian Sauer
  • 10,351
  • 10
  • 53
  • 85