How to get better result when using Tesseract on receipt?

Question

I'm building an app in Java that scan receipt, and extract all the text using OCR with the tesseract library. I've run the program on 2 images, one that I've take, and one from the internet, and I'm getting an almost perfect result with the one from the internet, but got random string from my image. How do I change that ? Do I need perfect quality image in high resolution ?

I've tried to take better images, even images with juste a single word, and I'm not getting anything.

Tesseract instance = new Tesseract();
instance.setDatapath(pathToMyTessData); 
instance.setLanguage("fra");

String result = instance.doOCR(new File(myReceiptFile));
System.out.println(result);

The receipt I'm trying to scan contains a lot of (useless for me) informations that I don't want to extract, is there any way to extract only food-items, date, total, etc ... ?

P.S: My ticket looks like this

score 0 · Answer 1 · answered Oct 01 '19 at 23:43

0

Maybe you should train your tesseract , there is another post about this. here

answered Oct 01 '19 at 23:43

Angelho Suarez

24
5

Training has sense only for special non standard fonts or for character missing in training data, which is not the case for about mentioned image. – user898678 Oct 03 '19 at 06:11

score 0 · Answer 2 · answered Oct 03 '19 at 06:15

0

You probably miss this SO topic image processing to improve tesseract OCR accuracy

If you want to have perfect result, maybe you will need to do custom layout analyze, so you can send to tesseract consisted text area (=> same size of font size).

answered Oct 03 '19 at 06:15

user898678

2,994
2
18
17

How to get better result when using Tesseract on receipt?

2 Answers2