0

I've been struggling to make the tess-two OCR project work, and when I finally did, it did recognize text when it's clear and when there are multiple lines there.

The whole point of this is that I need to use OCR to extract credit card number when the user takes a photo of it.

Here is an example of a credit card number:

enter image description here

This is just an example I used many pictures. for instance with this image I got the following text:

1238 5578 8875 5877
1238 5578 8875 5877
1238 5578 8875 5877

Here is the code I use for this:

TessBaseAPI baseApi = new TessBaseAPI();
baseApi.init("/mnt/sdcard/tesseract-ocr", "eng");
baseApi.setImage(bm);
baseApi.setPageSegMode(6);

String whiteList = "/1234567890";
baseApi.setVariable(TessBaseAPI.VAR_CHAR_WHITELIST, whiteList);

String recognizedText = baseApi.getUTF8Text();
baseApi.end();

Any help would be much appreciated.

Thanks !

TootsieRockNRoll
  • 3,218
  • 2
  • 25
  • 50

2 Answers2

1

Maybe some preprocessing steps of the image would make tesseract have a better performance. I could suggest you a whole paper(http://wbieniec.kis.p.lodz.pl/research/files/07_memstech_ocr.pdf) If you have time, if not, try to play with the image's contrast for example.

Here are also some ideas that can fit your issue: http://www.ocr-it.com/user-scenario-process-digital-camera-pictures-and-ocr-to-extract-specific-numbers

yonutix
  • 1,964
  • 1
  • 22
  • 51
0

Example images are allready fit for OCR, but as far as i see you're using tesseract's built-in "eng.traineddata" model and its not suitable (in terms of accuracy / performance) for credit card scanning. (Credit cards use the font "OCR-A"). Thus you'll need to either find pretrained model and replace it on initialization of tessApi or train it yourself from stratch.

For training tesseract model with custom font, see Yash Modi's answer at here

Ege Yıldırım
  • 430
  • 3
  • 14