What are good algorithms to capture a decent image in the first place? After which comes the pre-processing/image cleaning then finally OCR procedure?

Question

I’ve been working on a small project recently. The task seems quite simple at the first glance: to capture a somewhat good quality image for OCR purpose later on. The input device is sort of ID card reader prototype with camera embedded with capturing area about credit card, business card size. The trouble is the camera will be active all time so when no card or document is placed on to the device, it will see the whole wild outside world (think of a ATM machine with camera mounted for face recognition for example. The first problem is to determine whether or not there is a person standing in front the ATM, reliable and robustly).

In my case the similar question is to determine when a valid card has been placed onto the device FULLY and NOT MOVING, THEN grab one image or a couple of images for later processing modules.

I’ve been searching around for similar questions. Some answers are very useful and informative, for example:

image processing to improve tesseract OCR accuracy

Preprocessing image for Tesseract OCR with OpenCV

Image cleaning before OCR application

How can I improve the accuracy of Tesseract OCR?

What is the correct pre-processing steps that I should follow to improve the image captured by camera to be converted to text using OCR in Android?

However, all of them are assuming the capturing job has been done quite decently, meaning there is no motion blur, the valid area of the document or card is captured in its entireness, etc. Although the examples in the above links are indeed quite challenge in itself in terms of usual artefacts (e.g. distortion, uneven exposure, skewed text lines etc.), my trouble is I haven’t even gone that far yet!

So to summarise, I’m looking for existing methods/algorithmic ideas/relevant papers or links on:

How to determine there is a document or card put on the device?

How to determine all the valid region of the card is FULLY visible?

Some potentially useful cues I can think of:

Motion
Feature points (many many choices but how to use them properly)
Significant intensity level change when internal camera is coved by card or document.

For the first question, the difficulty is the dynamic background such as passing-by pedestrians, cars, sudden nature light change etc.

For second question, the problem is that the types of the card or document are not fixed, unfortunately - At least I was told so :( This makes the classification solution a bit tricky since no way to collect enough training samples, not to mention the half or quarterly inserted card…

score 0 · Answer 1 · answered Sep 05 '16 at 11:56

0

I was working on a similar project few days before.You can check here link

It has also the android version there. but as per your requirements card.io is very good open source api for that.

The version on git can also read mild math formulas.

answered Sep 05 '16 at 11:56

khakishoiab

9,673
2
16
22

score 0 · Answer 2 · answered Sep 05 '16 at 12:49

To check if a card is in place, you can probably compare two criteria:

the image is still: you achieve this by comparing two or more consecutive image and checking that the average difference (SAD) isn't larger than the level of noise;
there is text: when the OCR detects a sufficient number of characters.

For the second part of the question, I am afraid that there is no answer as a card can include graphics or pictures that OCR cannot identify and you'll never be sure if you see a partial card or one with extra stuff. You can check the continuity of the color along the edges over the whole area, but this can raise false alarms.

What are good algorithms to capture a decent image in the first place? After which comes the pre-processing/image cleaning then finally OCR procedure?

2 Answers2