Im trying to implement tesseract (tess-two) to read data of one ID or check. Could someone have done it? I am having problems at the time to recognize the text. The result have a lot of extra characters
Asked
Active
Viewed 838 times
1 Answers
0
In my experience with Tesseract OCR, I have found that I get much better results if I convert the image to byte binary (pixels are either black or white). The OCR engines tend to work better when there is high contrast. For information about how to convert Android Bitmaps to binary images, take a look at this question (Android: Convert Grayscale to Binary Image).
This link explains why black and white images tend to work better, and also talks about other ways to improve OCR accuracy (https://marinersoftware.deskpro.com/kb/articles/294-which-steps-can-be-taken-to-improve-the-accuracy-of-ocr-results-in-paperless).
While pre-processing the input image will improve accuracy, it may also be helpful to post-process the output text.
-
I had a good recognize of text. But I have to do a stretch of the image already taken into IOS or Androir. After that I get a big bucket of extra characters. I had tried OCRTest because I need to get a part of the ID only. But even with the Example(OCRTest for android) I get extra characters or even unrecognized lines. Do you used any tutorial? – init-ec Mar 31 '14 at 23:10
-
Check out this tutorial (http://rmtheis.wordpress.com/2011/08/06/using-tesseract-tools-for-android-to-create-a-basic-ocr-app/). Also, try using images with huge letters and black and white color schemes, like this one (http://data2.whicdn.com/images/13725579/large.jpg), and seeing what the output text looks like. – ashwin153 Mar 31 '14 at 23:16