1

I'm unable to read the form exactly on using node-tesseract.Only the printed text of the form is recognized and returned correctly whereas the handwritten text is returned with some special characters.

My code is,

var options = {
            l: 'deu',
            psm: 6,
            env: {
                maxBuffer: 4096 * 4096
            }
        };
        tesseract.process('./server/images/form.jpg', options, function (err,text) {
            if (err) {
                return console.log("An error occured: ", err);
            }
            console.log("Recognized text:");
            console.log(text);
        });

my input ------> OWNER Brian Dude output------> OW_NER ägga ] )ggé;= ‘

here, OWNER is some text filed here

yanana
  • 2,241
  • 2
  • 18
  • 28
  • Possible duplicate of [training tesseract for handwritten text](http://stackoverflow.com/questions/10763017/training-tesseract-for-handwritten-text) – sashoalm Mar 01 '17 at 16:06

2 Answers2

3
  1. Take a look at the following papers. Both are examples that use Tesseract Training process for handwriting recognition.

Tesseract Training for Handwritten Digit Recognition

Training Tesseract for Roman Font Handwriting

  1. Check out the official Tesseract Training page.

  2. The following link takes you through the Training Process, it helped me a lot. https://web.archive.org/web/20170820212334/http://www.resolveradiologic.com:80/blog/2013/01/15/training-tesseract

  3. Use a third party GUI for Tesseract Training, it will make your life much easier. I recommend tesseract4java and jTessBoxEditor (both work on OS X)

XP1
  • 6,910
  • 8
  • 54
  • 61
akozlu
  • 101
  • 8
0

You can train tesseract to recognize your handwritten text. See here.

yanana
  • 2,241
  • 2
  • 18
  • 28