-1

I'm trying to use pytesseract to extract text from images and have followed all relevant instructions. I have re-installed everything and tried most of the things suggested on SO. I followed the following installation instructions:

  1. Install pytesseract and tesseract in conda env:
    • conda install -c conda-forge pytesseract
    • conda install -c conda-forge tesseract
  2. Checked its presence:
    • For pytesseract:
      • import pytesseract
      • help(pytesseract)
    • And for tesseract itself: tesseract -h
  3. Determined location using where tesseract
  4. Full code:
import pytesseract
from PIL import Image

pytesseract.pytesseract.tesseract_cmd = r'/Users/<usr>/opt/anaconda3/envs/<conda_env>/bin/tesseract'

image_path = '/Users/<usr>/Library/Mobile Documents/com~apple~CloudDocs/Code/Python/PDF-to-markdown/output/97-1.jpg'

image = Image.open(image_path)
text = pytesseract.image_to_string(image)
print(text)
  1. Resulting error message:
File ~/opt/anaconda3/envs/pasttime/lib/python3.11/site-packages/pytesseract/pytesseract.py:423, in image_to_string(image, lang, config, nice, output_type, timeout)
    418 """
    419 Returns the result of a Tesseract OCR run on the provided image to string
    420 """
    421 args = [image, 'txt', lang, config, nice, timeout]
--> 423 return {
    424     Output.BYTES: lambda: run_and_get_output(*(args + [True])),
    425     Output.DICT: lambda: {'text': run_and_get_output(*args)},
    426     Output.STRING: lambda: run_and_get_output(*args),
    427 }[output_type]()
...
    262 with timeout_manager(proc, timeout) as error_string:
    263     if proc.returncode:
--> 264         raise TesseractError(proc.returncode, get_errors(error_string))

TesseractError: (-9, '')
Nimantha
  • 6,405
  • 6
  • 28
  • 69
Koen
  • 324
  • 2
  • 10

1 Answers1

0

have you installed Tesseract on your machine apart from the pytesseract?

for Windows, you can download and install the build here

you can refer here for more details

Ajeet Verma
  • 2,938
  • 3
  • 13
  • 24
  • Yes, as I mentioned in my question: conda install -c conda-forge tesseract. I even verified it using `tesseract -h` which lists all kinds of options that come with tesseract. – Koen Mar 21 '23 at 10:00
  • right, but you also need to install the Tesseract binary on your machine. I just tried and it's working – Ajeet Verma Mar 21 '23 at 10:03