Highest Voted 'pdf2image' Questions

12

votes

3 answers

Converting pdf to png with python (without pdf2image)

I want to convert a pdf (one page) into a png file. I installed pdf2image and get this error: popler is not installed in windows. According to this question: Poppler in path for pdf2image, poppler should be installed and PATH modified. I cannot do…

asked Oct 20 '21 at 10:01

JFerro

3,203
7
35
88

4

votes

3 answers

what is fastest way to convert pdf to jpg image?

I am trying to convert multiple pdfs (10k +) to jpg images and extract text from them. I am currently using the pdf2image python library but it is rather slow, is there any faster/fastest library than this? from pdf2image import…

python imagemagick ghostscript text-extraction pdf2image

asked Aug 25 '22 at 05:07

Sahil Lohiya

154
2
11

3

votes

1 answer

Python tempfile.TemporaryDirectory() cleanup crashes with PermissionError and NotADirectoryError

Premise I'm trying to convert some PDF to images via pdf2image and poppler, to then run some computervision tasks on. The conversion itsself works fine. However, the conversion creates some artifacts for each page in the pdf as it is being…

python temporary-files code-cleanup pdf2image

asked Nov 03 '22 at 09:57

Liqs

137
1
9

3

votes

0 answers

Getting 'UnidentifiedImageError: cannot identify image file error' while converting pdf to image on google colab

I am using pdf2image to convert a pdf file into image. I am using the method convert_from_path. However, I keep getting the above-mentioned error on google colab. Surprisingly this does not happen when I execute the same code on Jupyter notebook on…

nlp google-colaboratory data-preprocessing pdf2image

asked Jul 15 '22 at 07:23

Ace Purohit

53
4

2

votes

1 answer

pdf2image fails in docker container

I have a Python project running in a docker container, but I can't get convert_from_path to work (from pdf2image library). It works locally on my Windows PC, but not in the linux-based docker container. The error I get each time is Unable to get…

python docker docker-compose poppler pdf2image

asked Apr 10 '23 at 17:44

qoob

137
9

2

votes

3 answers

pdf2image: how to remove the '0001' in jpg file names?

My goal is to convert a multi page pdf file into a number of .jpg files, in such a way that the images are directly written to the hard-disk/SSD in stead of stored into memory. In python 3.11 : from pdf2image import convert_from_path poppler_path =…

pdf2image

asked Jan 06 '23 at 14:19

Gardener_NL

23
4

2

votes

0 answers

'Is poppler installed and in PATH' - Running pdf2Image script in android

I have a custom python script using pdf2Image which I am trying to run on an android phone. I have tried these two ways Use an android app that can run python scripts on android. Create an android app and integrate my python code (using…

python android android-studio chaquopy pdf2image

asked Aug 26 '21 at 20:39

Aakankasha Sharma

43
7

2

votes

1 answer

pdf2image outputs blank images on certain pdfs

I have code that converts PDFs to PNG files successfully on almost all PDFs, but I've been trying to convert this one, and it only saves blank images of each page. Note that I am using Windows 10 to do this. I can successfully get pdf2image to…

pdf2image

asked Jun 24 '21 at 21:51

Jesse Vincent

61
4

2

votes

1 answer

How to extract text boxes from a pdf and convert them to image

I'm trying to get cropped boxes from a pdf that has text in, this will be very usefull to gather training data for one of my models and that's why I need it. Here's a pdf…

python pdf text-extraction pdfminer pdf2image

asked Jun 16 '21 at 12:49

Tom

496
8
16

2

votes

3 answers

Why is pdf2image giving me a blank image file?

I trying to perform OCR using Tesseract OCR on multiple big pdf files (~400-600 pages). I don't necessarily want to extract text from all pages, but I just want a few pages (page numbers are known). The PDF file seems to have some sort of OCR…

python pdf ocr pdftoppm pdf2image

asked Jun 06 '21 at 17:01

Vedant Jumle

133
2
11

1

vote

1 answer

Pdf not being converted into JPEG image using pdf2image

I am using the following code to save pdf pages as images but its not storing as >JPEG but it is storing as >PPM file. How do I solve it? from pdf2image import convert_from_path pages = convert_from_path(path_to_pdf, output_folder=path_to_output,…

python pdf pdf2image

asked Aug 04 '23 at 10:18

ClawX69

73
1
7

1

vote

1 answer

I am not able to convert the pdf into png images by using convert_from_path() method of pdf2image

I want to convert the pdf pages into png and tif images. I am giving the fmt = png / fmt = tif; But still getting the resultant image in JPEG format only.Please help me in getting the correct output. images = convert_from_path( …

python pdf2image

asked Mar 10 '23 at 08:24

Sapna Sharma

11
1

1

vote

1 answer

Converting PDF page to JPG returns blank

I have a function that asks the user for a PDF file and receive the page number the user wish to convert into an image. The function usually works fine however with a few PDFs it does not work, the image that is returned is blank and it has 4 mega…

python python-imaging-library pdf2image

asked Nov 03 '22 at 14:28

Hugo Pinho

43
5

1

vote

2 answers

Poppler Installation on Google Colab

I am trying to convert pdf to image using pdf2image module on Google Colab. I have downloaded the latest version of poppler and also installed poppler-utils. In convert_from_path() , I mentioned the correct path to poppler's bin directory, still I'm…

python google-colaboratory poppler pdf2image poppler-utils

asked Sep 23 '22 at 06:20

Harsh Dwivedi

19
2

1

vote

0 answers

How to use PDF to PNG format for OCR without saving each page as PNG?

I am using OCR to scan invoices, and I have a large collection of PDFs. The code I am using to convert the PDF to PNG is the following: import fitz file_path = "my_file.pdf" dpi = 500 zoom = dpi / 72 # zoom factor, standard: 72 dpi magnify =…

python png ocr pdf2image

asked Sep 14 '22 at 17:18

AScientist1096

51
4

Questions tagged [pdf2image]

Resources

Converting pdf to png with python (without pdf2image)

what is fastest way to convert pdf to jpg image?

Python tempfile.TemporaryDirectory() cleanup crashes with PermissionError and NotADirectoryError

Getting 'UnidentifiedImageError: cannot identify image file error' while converting pdf to image on google colab

pdf2image fails in docker container

pdf2image: how to remove the '0001' in jpg file names?

'Is poppler installed and in PATH' - Running pdf2Image script in android

pdf2image outputs blank images on certain pdfs

How to extract text boxes from a pdf and convert them to image

Why is pdf2image giving me a blank image file?

Pdf not being converted into JPEG image using pdf2image

I am not able to convert the pdf into png images by using convert_from_path() method of pdf2image

Converting PDF page to JPG returns blank

Poppler Installation on Google Colab

How to use PDF to PNG format for OCR without saving each page as PNG?