How to train tesseract to identify only numbers

Question

I have some sample of product tags which includes only numbers. However I managed to process those images so that I could use those images to recognize the digits. I used English trained data file but the results were really bad. Is there a way I can train a data set using template images.

I have referred the documentation of training tesseract but I couldn't train using the images.

But after having the box file how can I make the eng.traineddata.

Can someone please help me.

This is the cropped original image of the product tag https://i.stack.imgur.com/ShefI.jpg

This is the processed image of the product tag https://i.stack.imgur.com/0tDFW.jpg

score 0 · Answer 1 · answered Oct 29 '13 at 23:56

0

You could try setting a whitelist of characters to be recognised (digits in your case). The parameter is called tessedit_char_whitelist. Honestly results could be mixed.

answered Oct 29 '13 at 23:56

Remon Nashid

120
1
7

score 0 · Answer 2 · answered Feb 05 '19 at 07:33

You can use only whitelisting if you have e traineddata set which supports it. If you want a fast result use Tesseract 3.x there should be plenty of trainedata available which support whitelisting (which works awesome).

I by myself used Tesseract 4 whith a traineddata which worked tremendously with the following options: -l digits --psm 10

See this Post for the Link to the Data set: Can not find Tesseract 4.0 tessdata only for Numbers

How to train tesseract to identify only numbers

2 Answers2