how to recognize form data from different fields of form using OCR in java?

Question

here is the form

I have an image of form which contains different fields like name, number, address etc. I want to recognize data from these fields and save them to database. Now, my OCR is working fine but I don't know how to extract specific field data(name, address) from image to be used for OCR. simply I want to know how to recognize characters in output files are from name field or address field or any other field.

score 0 · Answer 1 · edited May 23 '17 at 12:34

0

Since you know the exact areas of the form the different fields will be in, you can use some image manipulation library crop the image and send only specific regions to the OCR engine.

Check this SO question.

edited May 23 '17 at 12:34

Community

1
1

answered Nov 21 '12 at 07:28

Osiris

4,195
2
22
52

yes , but how cropping would be done automatically with exact areas. please tell me the solution without cropping the image and just extract fields with title or name ??? – ankita sharma Nov 21 '12 at 07:44
Since the form is always going to look the same, scan a sample form, and open the image in Paint.NET or something similar. You'll be able to find the exact coordinates of the name/title boxes. – Osiris Nov 21 '12 at 07:46
wow thanks thats really helpful. but cud u just tell me how to do that without cropping and just by using title of the field. again thank u very much – ankita sharma Nov 21 '12 at 07:49
forms fields wont be same always there are different forms with other fields than above. This is only a sample form .how to automatically recognize the form fields. please help – ankita sharma Nov 21 '12 at 08:20

score 0 · Answer 2 · answered Feb 19 '19 at 07:57

You have two solutions to get the data you want either you use @osiris's solution or you have to add a text mining layer. First solution : you get the image and cut it into pieces (the pieces that contains the needed data). For example, you cut the image into 2 pieces one that contains the name and the second one that contains the address by cropping the original image based on fields position (X & Y)and for that you have to use an image library to manipulate your original image . The second solution is to use a text mining layer without doing the cropping. In this solution you have to use models that detects the names and addresses (duckling.ai), you can train your own model or you can even use some chatbot engines and you train your chatbot engine to detect the names and addresses as entities (recast.ai or rasa for example).

how to recognize form data from different fields of form using OCR in java?

2 Answers2