Questions tagged [pdf2image]

A wrapper around the pdftoppm and pdftocairo command line tools to convert PDF to a PIL Image list.

pdf2image is a Python package that wraps pdftoppm and pdftocairo to convert PDF to a PIL Image object.

Resources

71 questions
12
votes
3 answers

Converting pdf to png with python (without pdf2image)

I want to convert a pdf (one page) into a png file. I installed pdf2image and get this error: popler is not installed in windows. According to this question: Poppler in path for pdf2image, poppler should be installed and PATH modified. I cannot do…
JFerro
  • 3,203
  • 7
  • 35
  • 88
4
votes
3 answers

what is fastest way to convert pdf to jpg image?

I am trying to convert multiple pdfs (10k +) to jpg images and extract text from them. I am currently using the pdf2image python library but it is rather slow, is there any faster/fastest library than this? from pdf2image import…
3
votes
1 answer

Python tempfile.TemporaryDirectory() cleanup crashes with PermissionError and NotADirectoryError

Premise I'm trying to convert some PDF to images via pdf2image and poppler, to then run some computervision tasks on. The conversion itsself works fine. However, the conversion creates some artifacts for each page in the pdf as it is being…
Liqs
  • 137
  • 1
  • 9
3
votes
0 answers

Getting 'UnidentifiedImageError: cannot identify image file error' while converting pdf to image on google colab

I am using pdf2image to convert a pdf file into image. I am using the method convert_from_path. However, I keep getting the above-mentioned error on google colab. Surprisingly this does not happen when I execute the same code on Jupyter notebook on…
2
votes
1 answer

pdf2image fails in docker container

I have a Python project running in a docker container, but I can't get convert_from_path to work (from pdf2image library). It works locally on my Windows PC, but not in the linux-based docker container. The error I get each time is Unable to get…
qoob
  • 137
  • 9
2
votes
3 answers

pdf2image: how to remove the '0001' in jpg file names?

My goal is to convert a multi page pdf file into a number of .jpg files, in such a way that the images are directly written to the hard-disk/SSD in stead of stored into memory. In python 3.11 : from pdf2image import convert_from_path poppler_path =…
2
votes
0 answers

'Is poppler installed and in PATH' - Running pdf2Image script in android

I have a custom python script using pdf2Image which I am trying to run on an android phone. I have tried these two ways Use an android app that can run python scripts on android. Create an android app and integrate my python code (using…
2
votes
1 answer

pdf2image outputs blank images on certain pdfs

I have code that converts PDFs to PNG files successfully on almost all PDFs, but I've been trying to convert this one, and it only saves blank images of each page. Note that I am using Windows 10 to do this. I can successfully get pdf2image to…
2
votes
1 answer

How to extract text boxes from a pdf and convert them to image

I'm trying to get cropped boxes from a pdf that has text in, this will be very usefull to gather training data for one of my models and that's why I need it. Here's a pdf…
Tom
  • 496
  • 8
  • 16
2
votes
3 answers

Why is pdf2image giving me a blank image file?

I trying to perform OCR using Tesseract OCR on multiple big pdf files (~400-600 pages). I don't necessarily want to extract text from all pages, but I just want a few pages (page numbers are known). The PDF file seems to have some sort of OCR…
Vedant Jumle
  • 133
  • 2
  • 11
1
vote
1 answer

Pdf not being converted into JPEG image using pdf2image

I am using the following code to save pdf pages as images but its not storing as >JPEG but it is storing as >PPM file. How do I solve it? from pdf2image import convert_from_path pages = convert_from_path(path_to_pdf, output_folder=path_to_output,…
ClawX69
  • 73
  • 1
  • 7
1
vote
1 answer

I am not able to convert the pdf into png images by using convert_from_path() method of pdf2image

I want to convert the pdf pages into png and tif images. I am giving the fmt = png / fmt = tif; But still getting the resultant image in JPEG format only.Please help me in getting the correct output. images = convert_from_path( …
1
vote
1 answer

Converting PDF page to JPG returns blank

I have a function that asks the user for a PDF file and receive the page number the user wish to convert into an image. The function usually works fine however with a few PDFs it does not work, the image that is returned is blank and it has 4 mega…
Hugo Pinho
  • 43
  • 5
1
vote
2 answers

Poppler Installation on Google Colab

I am trying to convert pdf to image using pdf2image module on Google Colab. I have downloaded the latest version of poppler and also installed poppler-utils. In convert_from_path() , I mentioned the correct path to poppler's bin directory, still I'm…
1
vote
0 answers

How to use PDF to PNG format for OCR without saving each page as PNG?

I am using OCR to scan invoices, and I have a large collection of PDFs. The code I am using to convert the PDF to PNG is the following: import fitz file_path = "my_file.pdf" dpi = 500 zoom = dpi / 72 # zoom factor, standard: 72 dpi magnify =…
1
2 3 4 5