Best method to train Tesseract 3.02

Question

i'm wondering what is the best method to train Tesseract (kind of text/TIFF and so on) for a particular kind of documents, with these particularities:

the structure and main text of the documents is always the same
the only things that change are 5 alphanumeric codes (THIS ARE THE REAL IMPORTANT THING TO DETECT!)
Some of thes codes are bold

At the moment I used standard trained datas, I detect the entire text and I extrapolate the codes with some regular expressions. It's okay, but I've got errors sometimes, for example:

0 / O

L / I / 1

Please someone knowns some "tricks" to improve precision?

Thanks!

score 4 · Accepted Answer · answered Dec 03 '14 at 15:54

4

during training part of Tesseract, you have to make a file manually to give to the engine in order to specify ambiguous characters.

For more information look at the "unicharambigs" part of the Tesseract documentation.

Best Regards.

answered Dec 03 '14 at 15:54

Alto

522
6
13

Best method to train Tesseract 3.02

1 Answers1

Linked