I installed the OCRmyPDF package in a conda environment that I have been using with pytesseract. When I ran the command "ocrmypdf --help" I received the following error:
[WinError 2] The system cannot find the file specified
Traceback (most recent call last):
File "c:\users\{user}\anaconda3\envs\tesseract\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\{user}\anaconda3\envs\tesseract\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\{user}\Anaconda3\envs\tesseract\Scripts\ocrmypdf.exe\__main__.py", line 4, in <module>
File "c:\users\{user}\anaconda3\envs\tesseract\lib\site-packages\ocrmypdf\__init__.py", line 10, in <module>
from ocrmypdf import helpers, hocrtransform, leptonica, pdfa, pdfinfo
File "c:\users\{user}\anaconda3\envs\tesseract\lib\site-packages\ocrmypdf\leptonica.py", line 44, in <module>
raise MissingDependencyError(
ocrmypdf.exceptions.MissingDependencyError:
---------------------------------------------------------------------
This error normally occurs when ocrmypdf can't find the Leptonica
library, which is usually installed with Tesseract OCR. It could be that
Tesseract is not installed properly, we can't find the installation
on your system PATH environment variable.
The library we are looking for is usually called:
liblept-5.dll (Windows)
liblept*.dylib (macOS)
liblept*.so (Linux/BSD)
Please review our installation procedures to find a solution:
https://ocrmypdf.readthedocs.io/en/latest/installation.html
---------------------------------------------------------------------
Before it is asked, yes I do have tesseract installed as I have used pytesseract successfully. I suspect that the issue is coming from the fact that I used conda to install Tesseract, which it installed inside my environment, as opposed to downloading from source and compiling directly in Windows. In pytesseract, I have the ability to set the location of the Tesseract executable to the variable that pytesseract uses to call 'tesseract' by placing
pytesseract.pytesseract.tesseract_cmd = r'C:\Users\{user}\Anaconda3\envs\tesseract\Library\bin\tesseract.exe'
in the script. I have searched through the OCRmyPDF docs and the source code directly to see if I could find a variable or command line argument to which I could similarly assign the location, but have not had any success. Is there a similar work-around or do I have to compile Tesseract directly in Windows for OCRmyPDF to be able to function?
Also, I saw this thread that says I can add the conda environment to my system's PATH, but I am not sure if that would then allow OCRmyPDF to access the Tesseract and Leptonica packages and solve the problem or if that would open other issues or honestly what would happen as I have exceedingly limited knowledge of Windows from a programming standpoint.