How to improve Tesseract / Tessnet2 recognition speed and accuracy?

Question

I've seen that to limit scan errors you can define a whitelist for characters.

But I couldn't find information for the bool numericMode in the ocr.Init(@"c:\temp", "fra", false);

Suppose you would only want to scan numbers: Setting the whitelist to "0123456789" would be correct to obtain best results in recognition but what does the numericMode parameter of the Init Method do? I've always seen it as false even when the whitelist was "0123456789".

Also what is the best Bitmap parameters (pixelformat) for the image to feed to tessnet.

Thinkable · Answer 1 · 2012-07-31T01:42:28.053

From experience, numeric mode limits the results to numbers and supporting characters. I've seen "0123456789,.+-/*%<>$(){}" and more. Currency symbols are allowed.

Also from my experience, I've not seen any great benefit of reduced bit-depth formats over a full color image. However, I've not optimized for speed, only accuracy. If your fonts are small (lower case >= 8 pixels high) then enlarging the image can really enhance accuracy.

score 1 · Answer 2 · answered Sep 29 '11 at 07:54

1

The question of scanning numbers is listed in the Tesseract FAQ. If you have version 3, you should be able to just issue the command:

tesseract image.tif outputbase nobatch digits

answered Sep 29 '11 at 07:54

Jerry

966
2
13
28

Your answer is out of Context I'm talking about using tessnet2 wich is a C# Wrapper of 2.0 ocr library. I'm talking about doing the ocr recognition programmatically, not using the shell. Also the bitmap is a memory bitmap not a file so the the command is to no avail. – Relok Oct 03 '11 at 07:55

How to improve Tesseract / Tessnet2 recognition speed and accuracy?

2 Answers2