6

My software needs to read a fixed-length handwritten number.

While I could use a general-purpose library like Tesseract, I am sure there is something smarter. Tesseract will probably misinterpret some of the 1 or 7 as I or l, whereas a software that expects only numbers would not.

Knowing that there are only numbers (American-English way of writing them), the algorithm could focus on 10 potential matches instead of hundreds of symbols.

Any experience OCRing handwritten number-only fields?
What open source library/software did you get the best results with?

Nicolas Raoul
  • 58,567
  • 58
  • 222
  • 373

1 Answers1

5

From the FAQ of Tesseract:

How do I recognize only digits?

In 2.03 and above:

Use

TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789");

before calling an Init function or put this in a text file called tessdata/configs/digits:

tessedit_char_whitelist 0123456789

and then your command line becomes:

tesseract image.tif outputbase nobatch digits

Warning: Until the old and new config variables get merged, you must have the nobatch parameter too.

But I think since it was designed for printed—not handwritten—text, accuracy might suffer even for digits only.

Joey
  • 344,408
  • 85
  • 689
  • 683
  • 1
    Thanks for this! But indeed Tesseract don't seem to be designed for handwritten stuff so it would probably be quite mediocre at it. – Nicolas Raoul Apr 01 '10 at 07:46
  • @nic: Maybe you could re-train it. It seems to be possible. – Joey Apr 01 '10 at 07:52
  • Seems possible indeed. But when I propose this solution to the client company, they might look at me funny... A proven solution with community (even small) would probably be more credible. I would be surprised if it does not exist already. – Nicolas Raoul Apr 01 '10 at 08:22
  • Now that I think about it, handwritten digits should not be really difficult to recognize... they are not linked into fuzzy words like Latin letters. Much easier to recognize than handwritten text. – Nicolas Raoul Apr 01 '10 at 08:25
  • @Nicolas, did you manage to find a proper solution for your use case (OCR adapted for handwritten numericals)? – Miroslav Dzhokanov Jan 06 '16 at 12:13
  • @MiroslavDzhokanov: Unfortunately not. By the way, this question was off-topic here, so I recreated it at http://softwarerecs.stackexchange.com/questions/27834/accurate-open-source-ocr-for-handwritten-numbers – Nicolas Raoul Jan 07 '16 at 04:26