I'm trying to do the following:
- GIVEN:
Scanned/shot document - a form that is filled by many different programs.
I'm trying to recognize only small part of the data. As shown here:
All symbols are digits except the first one in the top-most field which is a letter.
THE PROBLEM is that I tried tessaract and google ml OCRs but the results is very poor , maybe because it's single symbols in cells, not normal text. I don't know.
So I decided to try I own simple recognizing module.
a) I transform it to grayscale and to B&W then
b) Unfortunately there is not garanteed that fields and at exact same places every time. Also they are not with same size because of the scan/photo.
So I'm dynamically trying to find the places of the fields. But on there test photos I received for test there is no garantee that lines are stright. Also the scanned/shot resolution is not always the same.
It would be great if someone can give me advice on the following:
Dynamically finding the fields. (currently my success is about 50% depending of the photo)
How to handle the non-straight lines.
How to detect a single cell content/symbol.
A good way to recognize single symbols/digits/ (compare the source etc.)
And maybe a better B&W-transformation, not a simple threshold.