3

I have some pdf, I want to read them as pictures to get all the pixels info.

So I tried first to convert the pdf into jpeg:

from pdf2image import convert_from_path
img = convert_from_path('mypdf.pdf')

This works. Now I am gonna try to get the pixel info, but I have an error:

import matplotlib.pyplot as plt
pixel_img = plt.imread(img[0])

TypeError: Object does not appear to be a 8-bit string path or a Python file-like object

I don´t understand it, as the plt.imread() seems to work when I use it to read an original .jpeg. The img is a PIL object, so shouldn´t it be a "python file-like object"?

I also tried to use the PIL package (as img as a PIL object), and tried to read with a different method (but all I get is another mistake):

from PIL import Image    
pixel_img = Image.open(img[0])

AttributeError: 'PpmImageFile' object has no attribute 'read'

This link is not exactly as I want, because just save the pdf as jpg. But I don´t want to save it, I just want to read it and get the pixel info.

Thanks

GonzaloReig
  • 77
  • 1
  • 6

1 Answers1

4

convert_from_path returns a list of PIL images, so you must not treat them as files.

The following converts the pages of a PDF to PIL images, converts the first page/image to a numpy array (for easy access to pixels) and gets the pixel at position y=10, x=15:

from pdf2image import convert_from_path
import numpy as np

images = convert_from_path('test.pdf')

# to numpy array
image = np.array(images[0])

# get pixel at position y=10, x=15
# where pix is an array of R, G, B.
# e.g. pix[0] is the red part of the pixel
pix = image[10,15]
Tankred
  • 196
  • 1
  • 9
  • But with that I just save the file as jpg. What I want is to read it and get the pixel info, no save it as .jpg – GonzaloReig Aug 26 '19 at 12:40
  • images[0] is a standard PIL image. Maybe this will help you get the pixels: https://stackoverflow.com/a/11064935/5665958. Another possibility is to convert it to a numpy array, which might or might not be easier to work with (`np_image = numpy.array(images[0])`). – Tankred Aug 26 '19 at 12:42
  • I updated the answer, to show how to get a single pixel – Tankred Aug 26 '19 at 12:52