Python 2.7: Difficulty using pypdfocr for Windows 7

Question

I am trying to use pypdfocr in Windows 7 with Python 2.7.

This is the ERROR Message I get when I try pypdfocr in cmd:

C:\Users\chamar.stu>pypdfocr F:\test2.pdf Starting conversion of F:\test2.pdf 'pdfimages' is not recognized as an internal or external command, operable program or batch file. WARNING: Could not execute pdfimages to calculate DPI (try installing xpdf or po ppler?), so defaulting to 300dpi Traceback (most recent call last): File "c:\users\chamar.stu\appdata\local\continuum\anaconda2\lib\runpy.py", line 174, in _run_module_as_main ... .... ....

pypdfocr\pypdfocr_tesseract.py", line 98, in _is_version_uptodate ver = [int(x) for x in ver_str.split('.')] ValueError: invalid literal for int() with base 10: '00alpha'

It seems that I am missing Poppler or XPDF but I did install Poppler via PyGoObject as suggested here. I've also link xpdf in my environmental path as suggested here.

Any suggestions to get me out of this little mess?

score 1 · Answer 1 · answered Mar 17 '17 at 09:02

1

The pypdfocr script is probably calling the pdfimages program (one of the poppler utilities, not the library) using the subprocess module.

I could not easily discern if the utilities were provided in the URI you mention.

If not, you can find pre-built ms-windows executables for the utilities e.g. here.

Make sure that the location where the poppler utilities are installed is in your PATH, so that pypdfocr can find it.

answered Mar 17 '17 at 09:02

Roland Smith

42,427
3
64
94

OK thanks -- The link to the Popple .exe on the website is down.. I have to wait for it to re-up... – Plug4 Mar 17 '17 at 11:16

Eduard Florinescu · Answer 2 · 2018-11-03T18:43:11.540

0

Try downgrading Tesseract from version 4.0.0-beta.1(my case) to version 3.x that doesn't contain alphanumericals in the name.

tesseract --version #to check

The version check built into the pypdfocr package is expecting the version numbers to be integers, hence the error on '00alpha' ('0-beta' in my case)

edited Nov 03 '18 at 18:43

answered Nov 03 '18 at 18:33

Eduard Florinescu

16,747
28
113
179

Python 2.7: Difficulty using pypdfocr for Windows 7

2 Answers2