1

I want to covert image-based PDF to image(.png/.jpg) file in Python, so I can further use this image for exacting tabular data form it. I don not want to run the code from command line.

I am currently using Python 3.7.1 version and Pycharm IDE.

I have tried the code provided on stackoverflow but nothing works, it runs but unable to extract image form image-based PDF file. Below is the link for it. Extracting images from pdf using Python

Also, tried the code from dzone.com, below is the link but nothing works https://dzone.com/articles/exporting-data-from-pdfs-with-python

Below are the image-based PDF file links:

link1: https://www.molex.com/pdm_docs/sd/190390001_sd.pdf

link2: https://www.te.com/commerce/DocumentDelivery/DDEController?Action=showdoc&DocId=Customer+Drawing%7FDT04-12PX-C015%7F-%7Fpdf%7FEnglish%7FENG_CD_DT04-12PX-C015_-.pdf%7FDT04-12PA-C015

Please suggest any solution for this.

Vishal
  • 119
  • 1
  • 13
  • 1
    Does this answer your question? [Convert PDF to Image using Python](https://stackoverflow.com/questions/60701262/convert-pdf-to-image-using-python) – Joe Apr 24 '20 at 05:29
  • 1
    https://stackoverflow.com/questions/46184239/extract-a-page-from-a-pdf-as-a-jpeg – Joe Apr 24 '20 at 05:30
  • thank you joe, this link is very helpful to me, this is what i was searching for long time – Vishal Apr 24 '20 at 07:25
  • If it is a solution to your question please close / delete it. – Joe Apr 24 '20 at 09:03

1 Answers1

4

The pdf2image library converts pdf to images. As looking at your pdfs they are just images nothing else, you can convert the page to image

Install

pip install pdf2image

Once installed you can use following code to get images.

from pdf2image import convert_from_path
pages = convert_from_path('pdf_file', 500)

# Saving pages in jpeg format

for page in pages:
    page.save('out.jpg', 'JPEG')
Kuldeep Singh Sidhu
  • 3,748
  • 2
  • 12
  • 22