I have pictures that look like this:
And I am trying to get the output: "_ _ _ _ _ _ _ _ _ _ c _."
I was working in Python 3.6 and tried to use tesseract
for this. What I got so far is the following code:
import pytesseract
from PIL import Image
# set tesseract file path
pytesseract.pytesseract.tesseract_cmd = "C:/Program Files/Tesseract-OCR/tesseract.exe"
# configurations
config = "--psm 10 --oem 3 -c tessedit_char_whitelist=abcdefghijklmnopqrstuvwxyzßäöü0123456789_-"
image = Image.open("test2.png")
text = pytesseract.image_to_string(image, config=config)
However, this doesn't work. It just produces "ee" as output. With other pictures, it sometimes recognizes the correct letters, but never the underscores. I tried to whitelist them, but that didn't work either. How can this be done better? I would be grateful for any suggestions.