0

Currently, I'm trying to apply Tesseract-OCR on an image that is encapsulated in a pdf file. I'm on Windows and using Anaconda and Spyder as my IDE. I get the error: TesseractNotFoundError: C:\Users\jsbno\AppData\Local\Programs\Tesseract-OCR is not installed or it's not in your PATH. See README file for more information.

What can I do to solve this issue? See my code below for further information:

from pathlib import Path
from PIL import Image
import pytesseract
import typing
from borb.pdf.pdf import PDF
from borb.pdf import Document
from borb.toolkit.ocr.ocr_as_optional_content_group import OCRAsOptionalContentGroup
from borb.toolkit.text.simple_text_extraction import SimpleTextExtraction

pytesseract.pytesseract.tesseract_cmd = r'C:\Users\jsbno\AppData\Local\Programs\Tesseract-OCR'

def apply_ocr_to_document():
    # Set up everything for OCR
    tesseract_data_dir: Path = Path("C:\\Temp")
    assert tesseract_data_dir.exists()
    l: OCRAsOptionalContentGroup = OCRAsOptionalContentGroup(tesseract_data_dir)

    # Read Document
    doc: typing.Optional[Document] = None
    with open("C:\\Temp\\paklijst1.pdf", "rb") as pdf_file_handle:
        doc = PDF.loads(pdf_file_handle, [l])

    assert doc is not None

    # Initialize SimpleTextExtraction with the OCR result
    text_extraction = SimpleTextExtraction(doc)

    # Extract the text from the document
    extracted_text = text_extraction.extract()

    # Print the extracted text
    print(extracted_text)

# Call the function to apply OCR and extract the text
apply_ocr_to_document()

I've installed tesseract-OCR for Windows in C:\Users\jsbno\AppData\Local\Programs\Tesseract-OCR. This is the default map where Tesseract-OCR is placed when installing it. After the installation, I added Tesseract-OCR to my path through the control panel as described in the link below. Afterward, I installed pytesseract through the CMD.exe.Prompt in Anaconda. I wholly followed the steps on this page: https://codetoprosper.com/tesseract-ocr-for-windows/

JustinB
  • 1
  • 1

0 Answers0