Can tesseract recognize sequences of letters in a image that are not necessarily real words nor bound to any human language?

Asked Sep 21 '20 at 22:14

Active Sep 21 '20 at 22:40

Viewed 104 times

I am trying to do that in Python with tesseract, but it seems to depend on the language to be able to deduce the characters (and that makes sense).

It is a sequence of 14 letters with any of the printable first 800 2-byte utf8 characters, but even if the recognition (OCR) is limited to latin-1 (or less) chars that would be something.

As per this question it seems it does not need proper words, but the installer asks for a training set in a specific language.

ps. To clarify: OCR (at least in academic setting) takes advantage of the context and of a dictionary to help discover difficult letters.

edited Sep 21 '20 at 22:40

asked Sep 21 '20 at 22:14

dawid

Please, let's avoid language-tag spam. Limit the language to the one that you're currently using. – Hovercraft Full Of Eels Sep 21 '20 at 22:16
And OCR doesn't care if the text represents a "real word" or not. Your question needs clarification. – Hovercraft Full Of Eels Sep 21 '20 at 22:17
@HovercraftFullOfEels Ok, lets try only Python in this question first. OCR (at least in academic setting) takes advantage of the context and of a dictionary to help discover difficult letters. – dawid Sep 21 '20 at 22:20

Can tesseract recognize sequences of letters in a image that are not necessarily real words nor bound to any human language?

0 Answers0