I'm building an app in Java that scan receipt, and extract all the text using OCR with the tesseract library. I've run the program on 2 images, one that I've take, and one from the internet, and I'm getting an almost perfect result with the one from the internet, but got random string from my image. How do I change that ? Do I need perfect quality image in high resolution ?
I've tried to take better images, even images with juste a single word, and I'm not getting anything.
Tesseract instance = new Tesseract();
instance.setDatapath(pathToMyTessData);
instance.setLanguage("fra");
String result = instance.doOCR(new File(myReceiptFile));
System.out.println(result);
The receipt I'm trying to scan contains a lot of (useless for me) informations that I don't want to extract, is there any way to extract only food-items, date, total, etc ... ?
P.S: My ticket looks like this