Recognize symbols on a scanned/shot document

Question

I'm trying to do the following:

GIVEN: Scanned/shot document - a form that is filled by many different programs. I'm trying to recognize only small part of the data. As shown here:

All symbols are digits except the first one in the top-most field which is a letter.

THE PROBLEM is that I tried tessaract and google ml OCRs but the results is very poor , maybe because it's single symbols in cells, not normal text. I don't know.
So I decided to try I own simple recognizing module.

a) I transform it to grayscale and to B&W then

b) Unfortunately there is not garanteed that fields and at exact same places every time. Also they are not with same size because of the scan/photo.

So I'm dynamically trying to find the places of the fields. But on there test photos I received for test there is no garantee that lines are stright. Also the scanned/shot resolution is not always the same.

It would be great if someone can give me advice on the following:

Dynamically finding the fields. (currently my success is about 50% depending of the photo)
How to handle the non-straight lines.
How to detect a single cell content/symbol.
A good way to recognize single symbols/digits/ (compare the source etc.)
And maybe a better B&W-transformation, not a simple threshold.

Not sure if this helps, but you should look at these: https://stackoverflow.com/questions/51119801/detect-corners-of-grid https://stackoverflow.com/questions/48954246/find-sudoku-grid-using-opencv-and-python — Rohit Rawat, Jun 04 '20 at 12:21

score 1 · Answer 1 · answered Jun 05 '20 at 18:40

Try to remove the rectangular frames around the numbers before recognition. By morphological operations such as closing / opening, you can close the frame at the bottom of the picture, thereby saving the numbers.

convert input.jpg  -threshold 90% -fuzz 25% -fill black -floodfill +0+0 white -fill white -floodfill +0+0 black out.png

Recognize symbols on a scanned/shot document

1 Answers1