I’ve been working on a small project recently. The task seems quite simple at the first glance: to capture a somewhat good quality image for OCR purpose later on. The input device is sort of ID card reader prototype with camera embedded with capturing area about credit card, business card size. The trouble is the camera will be active all time so when no card or document is placed on to the device, it will see the whole wild outside world (think of a ATM machine with camera mounted for face recognition for example. The first problem is to determine whether or not there is a person standing in front the ATM, reliable and robustly).
In my case the similar question is to determine when a valid card has been placed onto the device FULLY and NOT MOVING, THEN grab one image or a couple of images for later processing modules.
I’ve been searching around for similar questions. Some answers are very useful and informative, for example:
image processing to improve tesseract OCR accuracy
Preprocessing image for Tesseract OCR with OpenCV
Image cleaning before OCR application
How can I improve the accuracy of Tesseract OCR?
However, all of them are assuming the capturing job has been done quite decently, meaning there is no motion blur, the valid area of the document or card is captured in its entireness, etc. Although the examples in the above links are indeed quite challenge in itself in terms of usual artefacts (e.g. distortion, uneven exposure, skewed text lines etc.), my trouble is I haven’t even gone that far yet!
So to summarise, I’m looking for existing methods/algorithmic ideas/relevant papers or links on:
- How to determine there is a document or card put on the device?
- How to determine all the valid region of the card is FULLY visible?
Some potentially useful cues I can think of:
- Motion
- Feature points (many many choices but how to use them properly)
- Significant intensity level change when internal camera is coved by card or document.
For the first question, the difficulty is the dynamic background such as passing-by pedestrians, cars, sudden nature light change etc.
For second question, the problem is that the types of the card or document are not fixed, unfortunately - At least I was told so :( This makes the classification solution a bit tricky since no way to collect enough training samples, not to mention the half or quarterly inserted card…