3

I'm trying to convert a multipage PDF file to image with PyMuPDF:

pdffile = "input.pdf"
doc = fitz.open(pdffile)
page = doc.loadPage()  # number of page
pix = page.getPixmap()
output = "output.tif"
pix.writePNG(output)

But I need to convert all the pages of the PDF file to a single image in multi-page tiff, when I give the page argument a page range, it just takes one page, does anyone know how I can do it?

David Delos
  • 41
  • 1
  • 3

4 Answers4

7
import fitz
from PIL import Image

input_pdf = "input.pdf"
output_name = "output.tif"
compression = 'zip'  # "zip", "lzw", "group4" - need binarized image...

zoom = 2 # to increase the resolution
mat = fitz.Matrix(zoom, zoom)

doc = fitz.open(input_pdf)
image_list = []
for page in doc:
    pix = page.getPixmap(matrix = mat)
    img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
    image_list.append(img)
    
if image_list:
    image_list[0].save(
        output_name,
        save_all=True,
        append_images=image_list[1:],
        compression=compression,
        dpi=(300, 300),
    )
ZdPo Ster
  • 300
  • 5
  • 12
  • A [multi-page tiff](https://smallbusiness.chron.com/open-multipage-tif-28815.html) per document is needed. There seems to be some kind of [support, at least for reading](https://stackoverflow.com/a/18658742/1136400), – Doncho Gunchev Sep 01 '22 at 19:28
4

When you want to convert all pages of the PDFs, you need a for loop. Also, when you call .getPixmap(), you need properties like matrix = mat to basically increase your resolution. Here is the code snippet (not sure if this is what you wanted but this will convert all PDFs to images):

doc = fitz.open(pdf_file)
zoom = 2 # to increase the resolution
mat = fitz.Matrix(zoom, zoom)
noOfPages = doc.pageCount
image_folder = '/path/to/where/to/save/your/images'

for pageNo in range(noOfPages):
    page = doc.loadPage(pageNo) #number of page
    pix = page.getPixmap(matrix = mat)
    
    output = image_folder + str(pageNo) + '.jpg' # you could change image format accordingly
    pix.writePNG(output)
    print('Converting PDFs to Image ... ' + output)
    # do your things afterwards

For resolution, here is a good example from Github to demo what it means and how it's used for your case if needed.

liamsuma
  • 156
  • 4
  • 19
  • A [multi-page tiff](https://smallbusiness.chron.com/open-multipage-tif-28815.html) per document is needed. There seems to be some kind of [support, at least for reading](https://stackoverflow.com/a/18658742/1136400), – Doncho Gunchev Sep 01 '22 at 19:29
1
import fitz    
pdffile = "input.pdf"
doc = fitz.open(pdffile)
i = 0
for page in doc:
    i += 1
    pix = page.getPixmap()
    output = "output_" + str(i) + ".tif"
    pix.save(output)
Roizy Kish
  • 33
  • 5
  • Won't this just write the first, overwrite with the second ... and finally overwrite with the last page from the PDF? – Doncho Gunchev Sep 01 '22 at 19:30
  • You're right, it would, I'll add a counter to avoid that. – Roizy Kish Sep 01 '22 at 19:47
  • That will result in multiple files per document, not a multi-page tiff. See my comment on the other answers. [multipage-tiff](https://smallbusiness.chron.com/open-multipage-tif-28815.html) No idea if doable with python, reading such seems implemented though. – Doncho Gunchev Sep 01 '22 at 19:50
  • In that case you can merge the files after they are converted. – Roizy Kish Sep 01 '22 at 19:56
1

PyMuPDF supports a limited set of image types for output. TIFF is not among them.

However, there is an easy way to interface with Pillow, which supports multiframe TIFF output.

Jorj McKie
  • 2,062
  • 1
  • 13
  • 17