Convert PDF file to multipage image

Question

I'm trying to convert a multipage PDF file to image with PyMuPDF:

pdffile = "input.pdf"
doc = fitz.open(pdffile)
page = doc.loadPage()  # number of page
pix = page.getPixmap()
output = "output.tif"
pix.writePNG(output)

But I need to convert all the pages of the PDF file to a single image in multi-page tiff, when I give the page argument a page range, it just takes one page, does anyone know how I can do it?

score 7 · Answer 1 · answered Feb 12 '21 at 15:25

import fitz
from PIL import Image

input_pdf = "input.pdf"
output_name = "output.tif"
compression = 'zip'  # "zip", "lzw", "group4" - need binarized image...

zoom = 2 # to increase the resolution
mat = fitz.Matrix(zoom, zoom)

doc = fitz.open(input_pdf)
image_list = []
for page in doc:
    pix = page.getPixmap(matrix = mat)
    img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
    image_list.append(img)
    
if image_list:
    image_list[0].save(
        output_name,
        save_all=True,
        append_images=image_list[1:],
        compression=compression,
        dpi=(300, 300),
    )

A [multi-page tiff](https://smallbusiness.chron.com/open-multipage-tif-28815.html) per document is needed. There seems to be some kind of [support, at least for reading](https://stackoverflow.com/a/18658742/1136400), — Doncho Gunchev, Sep 01 '22 at 19:28

score 4 · Answer 2 · answered Oct 13 '20 at 21:27

When you want to convert all pages of the PDFs, you need a for loop. Also, when you call .getPixmap(), you need properties like matrix = mat to basically increase your resolution. Here is the code snippet (not sure if this is what you wanted but this will convert all PDFs to images):

doc = fitz.open(pdf_file)
zoom = 2 # to increase the resolution
mat = fitz.Matrix(zoom, zoom)
noOfPages = doc.pageCount
image_folder = '/path/to/where/to/save/your/images'

for pageNo in range(noOfPages):
    page = doc.loadPage(pageNo) #number of page
    pix = page.getPixmap(matrix = mat)
    
    output = image_folder + str(pageNo) + '.jpg' # you could change image format accordingly
    pix.writePNG(output)
    print('Converting PDFs to Image ... ' + output)
    # do your things afterwards

For resolution, here is a good example from Github to demo what it means and how it's used for your case if needed.

A [multi-page tiff](https://smallbusiness.chron.com/open-multipage-tif-28815.html) per document is needed. There seems to be some kind of [support, at least for reading](https://stackoverflow.com/a/18658742/1136400), — Doncho Gunchev, Sep 01 '22 at 19:29

Roizy Kish · Answer 3 · 2022-09-01T19:53:18.020

1

import fitz    
pdffile = "input.pdf"
doc = fitz.open(pdffile)
i = 0
for page in doc:
    i += 1
    pix = page.getPixmap()
    output = "output_" + str(i) + ".tif"
    pix.save(output)

edited Sep 01 '22 at 19:53

answered Aug 31 '22 at 21:12

Roizy Kish

33
5

Won't this just write the first, overwrite with the second ... and finally overwrite with the last page from the PDF? – Doncho Gunchev Sep 01 '22 at 19:30
You're right, it would, I'll add a counter to avoid that. – Roizy Kish Sep 01 '22 at 19:47
That will result in multiple files per document, not a multi-page tiff. See my comment on the other answers. [multipage-tiff](https://smallbusiness.chron.com/open-multipage-tif-28815.html) No idea if doable with python, reading such seems implemented though. – Doncho Gunchev Sep 01 '22 at 19:50
In that case you can merge the files after they are converted. – Roizy Kish Sep 01 '22 at 19:56

score 1 · Answer 4 · answered Sep 04 '22 at 15:26

1

PyMuPDF supports a limited set of image types for output. TIFF is not among them.

However, there is an easy way to interface with Pillow, which supports multiframe TIFF output.

answered Sep 04 '22 at 15:26

Jorj McKie

2,062
1
13
17

Convert PDF file to multipage image

4 Answers4