0

I want to use pytesseract in my Sagemaker Jupyter notebook.

I am following this tutorial for installing pytesseract. After running pip install:

!pip install pytesseract
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Requirement already satisfied: pytesseract in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (0.3.10)
Requirement already satisfied: Pillow>=8.0.0 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from pytesseract) (9.0.1)
Requirement already satisfied: packaging>=21.3 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from pytesseract) (21.3)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /home/ec2-user/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages (from packaging>=21.3->pytesseract) (3.0.6)

the turotial indicates I should add the tesseract executeable to my path however I don't know where pip installs this executable?

# If you don't have tesseract executable in your PATH, include the following:
pytesseract.pytesseract.tesseract_cmd = r'<full_path_to_your_tesseract_executable>'

if I try to run pytesseract without this I get an error message:

from PIL import Image

import pytesseract

print(pytesseract.image_to_string(Image.open(testimage)))

results in:


~/anaconda3/envs/pytorch_p38/lib/python3.8/site-packages/pytesseract/pytesseract.py in run_tesseract(input_filename, output_filename_base, extension, lang, config, nice, timeout)
    258             raise
    259         else:
--> 260             raise TesseractNotFoundError()
    261 
    262     with timeout_manager(proc, timeout) as error_string:

TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.

I was able to find to the pytesseract instalation here:

/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/pytesseract

however when I update the tesseract_cmd with that location and invoke the same code I get:

PermissionError: [Errno 13] Permission denied: '/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/pytesseract'

My question is distinct (but related) from this question and I am getting a permission denied error when I link to the tesseract binary.

219CID
  • 340
  • 5
  • 15
  • Does this answer your question? [Pytesseract : "TesseractNotFound Error: tesseract is not installed or it's not in your path", how do I fix this?](https://stackoverflow.com/questions/50951955/pytesseract-tesseractnotfound-error-tesseract-is-not-installed-or-its-not-i) – Constantin Hong Dec 03 '22 at 14:20
  • Did you install the tesseract binary in Sagemaker Jupyter notebook? – Constantin Hong Dec 03 '22 at 14:22
  • I've already reviewed that question and they are not encountering the same permission denied error as me – 219CID Dec 03 '22 at 14:41
  • Okay. `tesseract_cmd` is not about the pip package. You put wrong the path. it requires the tesseract binary. check this error again. `TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.` – Constantin Hong Dec 03 '22 at 14:43
  • try `!cat /etc/os-release` command in your notebook. tell me the result. – Constantin Hong Dec 03 '22 at 14:56

0 Answers0