I am trying to use tesseract on this image:
When I use default configuration:
tesseract image.jpg stdout
It returns \KD FWOW
.
As you can see, the only mistake is the first letter L
being recognized as a backslash
So, I created a config file in /usr/share/tesseract-ocr/tessdata/configs
with the setting:
tessedit_char_whitelist ABCDEFGHIJKLMNOPQRSTUWXYZ
The goal is to recognize just letters, not special characters. However, when I run tesseract with this config:
tesseract image.jpg stdout letters
The result is XKD FVOIV
, and now it is missing more than one character, mainly the 'W'.
This makes no sense to me, I cant figure why it stopped to recognize the W when it is on the whitelist. For sure I am missing something in the config.
How can I fix it?