0

I'm trying to do the following:

  1. GIVEN: Scanned/shot document - a form that is filled by many different programs. I'm trying to recognize only small part of the data. As shown here: enter image description here

All symbols are digits except the first one in the top-most field which is a letter.

  1. THE PROBLEM is that I tried tessaract and google ml OCRs but the results is very poor , maybe because it's single symbols in cells, not normal text. I don't know.

  2. So I decided to try I own simple recognizing module.

a) I transform it to grayscale and to B&W then

b) Unfortunately there is not garanteed that fields and at exact same places every time. Also they are not with same size because of the scan/photo.

So I'm dynamically trying to find the places of the fields. But on there test photos I received for test there is no garantee that lines are stright. Also the scanned/shot resolution is not always the same.

It would be great if someone can give me advice on the following:

  1. Dynamically finding the fields. (currently my success is about 50% depending of the photo)

  2. How to handle the non-straight lines.

  3. How to detect a single cell content/symbol.

  4. A good way to recognize single symbols/digits/ (compare the source etc.)

  5. And maybe a better B&W-transformation, not a simple threshold.

Ivan Peshev
  • 435
  • 7
  • 16
  • 1
    Not sure if this helps, but you should look at these: https://stackoverflow.com/questions/51119801/detect-corners-of-grid https://stackoverflow.com/questions/48954246/find-sudoku-grid-using-opencv-and-python – Rohit Rawat Jun 04 '20 at 12:21

1 Answers1

1

Try to remove the rectangular frames around the numbers before recognition. By morphological operations such as closing / opening, you can close the frame at the bottom of the picture, thereby saving the numbers.

convert input.jpg  -threshold 90% -fuzz 25% -fill black -floodfill +0+0 white -fill white -floodfill +0+0 black out.png

enter image description here

Alex Alex
  • 1,893
  • 1
  • 6
  • 12