7

This link shows how pdfs could be converted to images. Is there a way to zoom my pdfs before converting to images? In my project, i am converting pdfs to pngs and then using Python-tesseract library to extract text. I noticed that if I zoom pdfs and then save parts as pngs then OCR provides much better results. So is there a way to zoom pdfs before converting to pngs?

user2543622
  • 5,760
  • 25
  • 91
  • 159

1 Answers1

13

I think that raising the quality (resolution) of your image is a better solution than zooming into the pdf.

using pdf2image you can accomplish this quite easily:

install pdf2image: pip install pdf2image

then, in python, convert your pdf into a high quality image:

from pdf2image import convert_from_path

pages = convert_from_path('sample.pdf', 400) #400 is the Image quality in DPI (default 200)

pages[0].save("sample.png")

by playing around with the quality parameter you should get the result you desider

Liam
  • 6,009
  • 4
  • 39
  • 53
  • 1
    any idea how high we could go on DPI? please provide a documentation link. I tried python help but it was that useful – user2543622 Mar 28 '19 at 23:32
  • 1
    @user2543622 it depends on the pdf you are working on and your computer memory, here is the wikipedia link for dpi:https://en.wikipedia.org/wiki/Dots_per_inch – Liam Mar 29 '19 at 19:46