4

I am trying to convert a pdf file to image file for this in my ubuntu server i have installed:

  1. python2.7
  2. poppler-utils
  3. pdf2image==1.12.1

My code:

from pdf2image import convert_from_path, convert_from_bytes

images = convert_from_path("/home/user/pdf_file.pdf")

# OR

with open("/home/user/pdf_file.pdf") as pdf:
    images = convert_from_bytes(pdf.read())

OUTPUT

When I am using the function "convert_from_path"

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 143, in convert_from_path
    thread_output_file = next(output_file)
TypeError: ThreadSafeGenerator object is not an iterator

When I am using the function "convert_from_bytes"

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 268, in convert_from_bytes
    paths_only=paths_only,
  File "/usr/local/lib/python2.7/dist-packages/pdf2image/pdf2image.py", line 143, in convert_from_path
    thread_output_file = next(output_file)
TypeError: ThreadSafeGenerator object is not an iterator

I have reinstalled all my utilities then i am facing these problems.

RokiDGupta
  • 371
  • 2
  • 7
  • 14
  • From pip https://pypi.org/project/pdf2image/ , Python 2.7 seems not supported. It clearly says A python (3.5+) module that wraps pdftoppm and pdftocairo to convert PDF to a PIL Image object for version 1.12.1 – Kris Mar 16 '20 at 06:51

2 Answers2

5

If you want to convert PDF to image you can try Python Ghostscript package:

pip install ghostscript

import ghostscript
import locale

def pdf2jpeg(pdf_input_path, jpeg_output_path):
    args = ["pef2jpeg", # actual value doesn't matter
            "-dNOPAUSE",
            "-sDEVICE=jpeg",
            "-r144",
            "-sOutputFile=" + jpeg_output_path,
            pdf_input_path]

    encoding = locale.getpreferredencoding()
    args = [a.encode(encoding) for a in args]

    ghostscript.Ghostscript(*args)

pdf2jpeg(
    "...Fixate/ActiveState/pdf/a.pdf",
    "...Fixate/ActiveState/pdf/a.jpeg",
)
Mohit Chandel
  • 1,838
  • 10
  • 31
4

I failed in python2 too, but succeeded in python3.

There's a same issue happened on an other library: TypeError: 'threadsafe_iter' object is not an iterator

As they said, it's a python 2 vs 3 issue, caused by next() function.
If modify __next__() -> next() in file/home/***/.local/lib/python2.7/site-packages/pdf2image/generators.py , it will run successful in py2.

BTW, i have create a new issue to pdf2image team.
TypeError: ThreadSafeGenerator object is not an iterator #133


Additional
pdf2image readme said it's a python (3.5+) module.
pdf2image v1.7.1 work on py27. try it by pip install pdf2image==1.7.1

gamesun
  • 227
  • 1
  • 10