How to read pdf images as opencv images using PyMuPDF?

Question

I would like to read all images found in a pdf file by PyMuPDF as opencv images, as close as they are from the source (avoiding funky format conversions that would lead to precision loss). Basically, I would like the result to be the exact same as if I was doing a cv2.imread(filename): (in terms of the type it outputs, color space, etc...)

# Libraries
import os
import cv2
import fitz
import numpy as np

# Input file
filename = "myfile.pdf"

# Read all images in file as a list of opencv images
def read_images(filename):
    images = []:
    _, extension = os.path.splitext(filename)
    # If it's a pdf process each image
    if (extension == ".pdf"):
        pdf = fitz.open(file)
        for index in range(len(pdf)):
            page = pdf[index]
            for im in page.getImageList():
                xref = im[0]
                pix = fitz.Pixmap(pdf, xref)
                images.append(pix_to_opencv_image(pix)) # DO SOMETHING HERE
    # Otherwise just do an imread 
    else:
        images.append(cv2.imread(filename))
    return images

Basically I would like to know what the function pix_to_opencv_image should be:

# Equivalent of doing a "cv2.imread" on a pdf pixmap:
def pix_to_opencv_image(pix):
    # DO SOMETHING HERE

If found example explaining how to convert pdf pixmaps to numpy arrays, but nothing that outputs an opencv image.

How can I achieve this?

try `np.asarray(the_pixmap.samples_mv)` and see what .shape that has. I'm betting `(width*height*nchannels, )` so you only have to reshape that using some other properties of the pixmap object -- if that didn't work, call `.tobytes()` to get the file data, then turn that into a numpy array, then you can use `cv.imdecode` -- — Christoph Rackwitz, Jul 03 '22 at 17:49

Jeru Luke · Accepted Answer · 2022-07-03T18:13:34.247

3

I used help() function to find the various data descriptors associated with it --> help(pix)

pix.samples stores the image information as bytes. Using numpy's frombuffer, the image array can be obtained from these bytes after reshaping accordingly.

pix.height and pix.width gives the height and width of the image array respectively. pix.n is the number of channels. These can be used for reshaping the resulting array.

Your complete function would be:

def pix_to_image(pix):
    bytes = np.frombuffer(pix.samples, dtype=np.uint8)
    img = bytes.reshape(pix.height, pix.width, pix.n)
    return img

You can display the result using cv2.imshow().

edited Jul 03 '22 at 18:13

answered Jul 03 '22 at 17:53

Jeru Luke

20,118
13
80
87

you need to convert the `img` from RGB to BGR before using in `cv2.imshow()` using the method `cv2.cvtColor()` for the image to retain its original color – dhiraj suvarna Sep 21 '22 at 04:10

How to read pdf images as opencv images using PyMuPDF?

1 Answers1