This link shows how pdf
s could be converted to images. Is there a way to zoom my pdf
s before converting to images? In my project, i am converting pdf
s to png
s and then using Python-tesseract
library to extract text. I noticed that if I zoom pdf
s and then save parts as png
s then OCR provides much better results. So is there a way to zoom pdfs before converting to pngs?
Asked
Active
Viewed 4,447 times
7

user2543622
- 5,760
- 25
- 91
- 159
-
1You can also create a High resolution image from PDF and crop the image section which you need to OCR. – flamelite Mar 27 '19 at 13:16
-
could you show how to do that? I need to do OCR on the whole page – user2543622 Mar 27 '19 at 14:51
-
could you provide some pdf examples? – Liam Mar 28 '19 at 17:44
1 Answers
13
I think that raising the quality (resolution) of your image is a better solution than zooming into the pdf.
using pdf2image
you can accomplish this quite easily:
install pdf2image: pip install pdf2image
then, in python, convert your pdf into a high quality image:
from pdf2image import convert_from_path
pages = convert_from_path('sample.pdf', 400) #400 is the Image quality in DPI (default 200)
pages[0].save("sample.png")
by playing around with the quality parameter you should get the result you desider

Liam
- 6,009
- 4
- 39
- 53
-
1any idea how high we could go on DPI? please provide a documentation link. I tried python help but it was that useful – user2543622 Mar 28 '19 at 23:32
-
1@user2543622 it depends on the pdf you are working on and your computer memory, here is the wikipedia link for dpi:https://en.wikipedia.org/wiki/Dots_per_inch – Liam Mar 29 '19 at 19:46