I'm trying to use pytesseract to extract text from images and have followed all relevant instructions. I have re-installed everything and tried most of the things suggested on SO. I followed the following installation instructions:
- Install pytesseract and tesseract in conda env:
- conda install -c conda-forge pytesseract
- conda install -c conda-forge tesseract
- Checked its presence:
- For pytesseract:
import pytesseract
help(pytesseract)
- And for tesseract itself:
tesseract -h
- For pytesseract:
- Determined location using
where tesseract
- Full code:
import pytesseract
from PIL import Image
pytesseract.pytesseract.tesseract_cmd = r'/Users/<usr>/opt/anaconda3/envs/<conda_env>/bin/tesseract'
image_path = '/Users/<usr>/Library/Mobile Documents/com~apple~CloudDocs/Code/Python/PDF-to-markdown/output/97-1.jpg'
image = Image.open(image_path)
text = pytesseract.image_to_string(image)
print(text)
- Resulting error message:
File ~/opt/anaconda3/envs/pasttime/lib/python3.11/site-packages/pytesseract/pytesseract.py:423, in image_to_string(image, lang, config, nice, output_type, timeout)
418 """
419 Returns the result of a Tesseract OCR run on the provided image to string
420 """
421 args = [image, 'txt', lang, config, nice, timeout]
--> 423 return {
424 Output.BYTES: lambda: run_and_get_output(*(args + [True])),
425 Output.DICT: lambda: {'text': run_and_get_output(*args)},
426 Output.STRING: lambda: run_and_get_output(*args),
427 }[output_type]()
...
262 with timeout_manager(proc, timeout) as error_string:
263 if proc.returncode:
--> 264 raise TesseractError(proc.returncode, get_errors(error_string))
TesseractError: (-9, '')