0

I have some sample of product tags which includes only numbers. However I managed to process those images so that I could use those images to recognize the digits. I used English trained data file but the results were really bad. Is there a way I can train a data set using template images.

I have referred the documentation of training tesseract but I couldn't train using the images.

But after having the box file how can I make the eng.traineddata.

Can someone please help me.

This is the cropped original image of the product tag https://i.stack.imgur.com/ShefI.jpg

This is the processed image of the product tag https://i.stack.imgur.com/0tDFW.jpg

samiles
  • 3,768
  • 12
  • 44
  • 71
Sean Mosby
  • 1
  • 1
  • 1

2 Answers2

0

You could try setting a whitelist of characters to be recognised (digits in your case). The parameter is called tessedit_char_whitelist. Honestly results could be mixed.

Remon Nashid
  • 120
  • 1
  • 7
0

You can use only whitelisting if you have e traineddata set which supports it. If you want a fast result use Tesseract 3.x there should be plenty of trainedata available which support whitelisting (which works awesome).

I by myself used Tesseract 4 whith a traineddata which worked tremendously with the following options: -l digits --psm 10

See this Post for the Link to the Data set: Can not find Tesseract 4.0 tessdata only for Numbers

Dorian Gaensslen
  • 173
  • 2
  • 10