0

I am trying to use tesseract on this image:

enter image description here

When I use default configuration:

tesseract image.jpg stdout

It returns \KD FWOW.
As you can see, the only mistake is the first letter L being recognized as a backslash

So, I created a config file in /usr/share/tesseract-ocr/tessdata/configs with the setting:

tessedit_char_whitelist ABCDEFGHIJKLMNOPQRSTUWXYZ

The goal is to recognize just letters, not special characters. However, when I run tesseract with this config:

tesseract image.jpg stdout letters

The result is XKD FVOIV, and now it is missing more than one character, mainly the 'W'.

This makes no sense to me, I cant figure why it stopped to recognize the W when it is on the whitelist. For sure I am missing something in the config.

How can I fix it?

Community
  • 1
  • 1
Tales Pádua
  • 1,331
  • 1
  • 16
  • 36
  • why not rectangularize the image first ... that is called preprocessing ... without proper preparation of data it is any CV operation useless ... – Spektre Jun 23 '16 at 06:13
  • The image was prepared to this point, but I am not using OpenCV, I am using imagemagick – Tales Pádua Jun 23 '16 at 14:08
  • That does not matter I do not use OpenCV either... find the skew from left and right ... and transform back to rectangular bounding box. similar to this http://stackoverflow.com/a/30273878/2521214 – Spektre Jun 23 '16 at 14:10

0 Answers0