4

I am using Tesseract OCR for my program and I am going to convert it into a single .exe file using pyinstaller. The problem is that in order for Tesseract to work, I need to reference the path to the program installed on my computer, like this: pytesseract.pytesseract.tesseract_cmd = 'E:\\Tesseract-OCR\\tesseract'

Since this is not just a separate library that can be imported, but a standalone program, I can't pass it to pyinstaller as an '--add_data' argument. How do I make a one-file executable then?

Mirrah
  • 125
  • 2
  • 9
  • Pick one from [`[python] [tesseract] path`](https://stackoverflow.com/search?q=isanswered%3Ayes+is%3Aquestion+%5Bpython%5D+%5Btesseract%5D+path) – stovfl Jan 20 '20 at 19:43
  • 1
    No, there's no answer to my question. Tesseract works perfectly, the path is correct. The problem is to shove tesseract together with my program into a single executable. Tesseract isn't a jpg or a text document that I can attach to pyinstaller, I can't attach entire folder that contains another program. – Mirrah Jan 20 '20 at 20:10
  • From your question: ***"I need to reference the path "***? It's not the `pyinstalller` part to set a correct environment path to `tesseract`. You have to do it either at the target computer or within your Python program. – stovfl Jan 20 '20 at 20:15
  • I meant to reference it for pyinstaller. So to use my program it turns out the user will have to install tesseract separately right? – Mirrah Jan 21 '20 at 04:45
  • ***"the user will have to install tesseract separately right? "***: That's one possible solution. Far as i know `pytesseract` is a wrapper around `tesseract.exe/*.dll` and expect the `*.exe/*.dll` in place. I can imagin, that `pyinstaller` can bundle all binary files, but you have to setup environment variables, so `pytesseract` is able to find and load these. – stovfl Jan 21 '20 at 08:44

6 Answers6

7

Assuming you're on Windows, I ran into this problem and think I solved it by compiling a static version of tesseract (which does not need to be installed) and including its path as a binary in the pyinstaller spec file.

Official compiling instructions here:

https://tesseract-ocr.github.io/tessdoc/Compiling.html#windows

Install MS Visual Studio 15 (with c++) and vcpkg and execute one of the following through command prompt:

for 64-bit: vcpkg install tesseract:x64-windows-static

for 32-bit: vcpkg install tesseract:x86-windows-static

The tesseract executable will be located a few subfolders within the vcpkg folder on your PC. With that file, you also need to download a .trainneddata file and place it within a folder called 'tessdata' in the same directory with the tesseract exe.

Create a pyinstaller spec file and edit the Analysis(binaries=[]) section to include the folder path where tesseract is located (if you're not using a subfolder for tesseract I think you'd need to add both tesseract.exe and the tessdata subfolder). I also changed inclide_binaries=True

Run pyinstaller and include the option --specpath 'yourspecfile.spec'

I haven't yet attempted to try it on a different PC, so haven't fully tested that it works as intended (I don't know anything about compiling c++, there may be additional files/links needed for tesseract that are still intact since I've only been testing on the build PC)

Zstr33
  • 71
  • 5
  • Let me know if you have issues. It was a pain to figure out so I'm happy to help! – Zstr33 Mar 16 '20 at 20:29
  • I finally installed vcpkg and tesseract-static, but now cannot figure out how to reference the .exe and .traineddata properly in the code. The example on the website is a bit confusing. I mean, now we don't have the tesseract.exe and traineddata anymore, instead we have a onefile executable script, so I can't do it like pytesseract.pytesseract.tesseract_cmd = 'E:\\Tesseract-OCR\\tesseract' right? How can we reference it then properly then? Thanks a lot in advance! – Mirrah Mar 29 '20 at 15:07
  • 1
    You still have the tesseract.exe and trainneddata files. I keep them in a subfolder with my Python code. With pyinstaller onefile, you also need to use a workaround for file paths, see the answer with the resource_path function here: https://stackoverflow.com/questions/7674790/bundling-data-files-with-pyinstaller-onefile – Zstr33 Apr 01 '20 at 12:12
  • 1
    To add to my last comment, when referencing tesseract with the resource_path function you'd do this: pytesseract.pytesseract.tesseract_cmd = resource_path('\tess\tesseract.exe') where 'tess' is the subfolder where I'm keeping tesseract within the main project folder. The reason you have to do this is because the pyinstaller onefile option will unpack everything to a temporary folder when it runs. – Zstr33 Apr 01 '20 at 12:25
4

@Zstr33's answer is correct, but it lacked detail. Following instructions have been tested on Windows 10 64-bit. Link to official compiling instructions here: https://tesseract-ocr.github.io/tessdoc/Compiling.html#windows.

Steps:

  1. Install Visual Studio. Make sure to install the below items: Click on Desktop Development with C++ and Universal Windows Platform Development

    Then, click on individual components.

    Click on the Tab Individual Components

    Then, select the following.
    Nuget Package Manager, MSVC v142 - VS 2019 C++ x64/x86 build tools, C++ CMake Tools for Windows, MSVC v142 - VS 2019 C++ ARM64 Build Tools, and NuGet targets and build tasks

    You can add whatever other components you want, but those are the ones that are needed to compile tesseract into a static binary. Also, if you don't use English, click on the language packs tab and add the English Language pack, this is needed for vcpkg.

  2. Follow the quick start guide for installing vcpkg, found here: https://github.com/microsoft/vcpkg#getting-started.

  3. Navigate to where you copied the vcpkg directory, or add it to path. Then run: vcpkg install tesseract:x64-windows-static for 64-bit, or vcpkg install tesseract:x86-windows-static for 32-bit.

  4. Go to place where you put the tesseract directory\tesseract_x64-windows-static\tools\tesseract for 64-bit, and place where you put the tesseract directory\tesseract_x86-windows-static\tools\tesseract for 32-bit.

  1. To use with pyinstaller, using --onefile.
KetZoomer
  • 2,701
  • 3
  • 15
  • 43
3

I built my application exe using Tesseract and EasyOCR with the following command, hope this helps.

python -m PyInstaller --paths "fullpath-to-custom-libraries" --add-data "C:\Program Files\Tesseract-OCR;Tesseract-OCR" --collect-all easyocr --onedir -w main.py
1

I did get it to run with Pyinstaller after all.

First, I needed to create 2 Hook files as described here:

https://github.com/jbarlow83/OCRmyPDF/issues/659#issuecomment-714479684

Then, when running the exe, I still got an error missing pikepdf._cpphelpers

To solve that, just add

from pikepdf import _cpphelpers

in your python file as described here:

How to fix a pyinstaller 'no module named...' error when my script imports the modules pikepdf and pdfminer3?

My Pyinstaller call looks like that:

pyinstaller --onefile appname.py --paths="C:\python\anaconda3\envs\appname\Lib\site-packages" --additional-hooks-dir="C:\coding\appname\Hooks"
jay
  • 11
  • 2
0

since bundling everything up with pyinstaller could be a real pain, I did the following steps:

  1. Imported Pytesseract in my script
  2. created the Exe file with pyinstaller (without defining anything in my spec file)
  3. bundled Tesseract-Ocr installer and my script.exe with an external installer creator.

So the final user will have both the tesseract installer and tesseract. With the external installer you have a lot of freedom and you can also play with the path variable.

Val
  • 280
  • 3
  • 13
  • This is a good way to play this around, thank you! However I would be unable to make a single onefile executable right? Because that implies I won't be able to reference any other executables. But I guess that would be the easiest way, thanks – Mirrah Mar 29 '20 at 15:11
  • can you explain more in details – Talha Anwar Jun 23 '23 at 12:49
0

I tried with pyinstaller and ocrmypdf forever and did not get it to work. I ended up using Nuitka. Worked right from the start :-)

Use sth. like:

python -m nuitka --mingw64 --standalone --follow-imports  yourapp.py

http://nuitka.net/doc/user-manual.html

There was a similar answer here somewhere already, just could not find it anymore to link to it.

dboy
  • 1,004
  • 2
  • 16
  • 24
jay
  • 11
  • 2