3

I have a folder with 452 images (.png) that I'm trying to merge into a single PDF file, using Python. Each of the images are labelled by their intended page number, e.g. "1.png", "2.png", ....., "452.png".

This code was technically successful, but input the pages out of the intended order.

import img2pdf
from PIL import Image    
with open("output.pdf", 'wb') as f:
    f.write(img2pdf.convert([i for i in os.listdir('.') if i.endswith(".png")]))

I also tried reading the data as binary data, then convert it and write it to the PDF, but this yields a 94MB one-page PDF.

import img2pdf
from PIL import Image

with open("output.pdf", 'wb') as f:
    for i in range(1, 453):
        img = Image.open(f"{i}.png")
        pdf_bytes = img2pdf.convert(img)
        f.write(pdf_bytes)

Any help would be appreciated, I've done quite a bit of research, but have come up short. Thanks in advance.

Dylan H.
  • 43
  • 5
  • 1
    While I'm not familiar with the libraries, I wonder if `os.listdir` is giving you an out of order result due to the way the image filenames are naturally sorted. Maybe pad the filenames with leading zeroes. – ggorlen Sep 21 '20 at 16:43
  • 1
    Load file names into a list, sort it the way you want, and then iterate it. – PM 77-1 Sep 21 '20 at 16:43
  • @Wups this will sort the file name lexicographically, not numerically – Mitch Sep 21 '20 at 16:47
  • @Wups: As long as OP is OK with lexicographic order that will put `2` in front of `11`. – PM 77-1 Sep 21 '20 at 16:50

1 Answers1

4

but input the pages out of the intended order

I suspect that the intended order is "in numerical order of file name", i.e. 1.png, 2.png, 3.png, and so forth.

This can be solved with:

with open("output.pdf", 'wb') as f:
    f.write(img2pdf.convert(sorted([i for i in os.listdir('.') if i.endswith(".png")], key=lambda fname: int(fname.rsplit('.',1)[0]))))

This is a slightly modified version of your first attempt, that just sorts the file names (in the way your second attempt tries to do) before batch-writing it to the PDF

inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
  • This is exactly what I was after - thanks a million. – Dylan H. Sep 21 '20 at 20:12
  • @DylanH. Glad I could help. But notice that I added one function call's worth of code. Really, you solved the problem yourself, and put two pieces of the solution in two different attempted solutions. So you should definitely feel good about having pretty much solved the problem yourself :) – inspectorG4dget Sep 21 '20 at 22:20