1

I am working on Handwritten Form Recognition System, till now i have reached to this step where,i have been able to detect text using java with openCV but now i want to read the text from each of these bounding boxes Click to open image

I have being doing research to find out the process for the same using java with openCV but i was unable to find any.

Suggest me some links,Technologies,methods or process to perform this particular task with "JAVA".

  • If you are able to draw the bounding boxes and you are able to detect text, I don't really get what the problem is. – Rick M. Jun 08 '17 at 09:13
  • Its like I have just Drawn rectangular bounding boxes on text of image ,but text inside those bounding boxes haven't been able to extract, i want to print that data on my console in string format. – Abhishek Pawar Jun 08 '17 at 10:05
  • For that you have to make a custom OCR detection algorithm, since the letters are handwritten. Ideally a machine learning algorithm trained with all the possibilities and then you need to predict "labels" for the "letters" in the bounding boxes – Rick M. Jun 08 '17 at 10:20
  • OK,so since i am a beginner could you provide me some stuff to refer,because its pretty much new for me. – Abhishek Pawar Jun 08 '17 at 11:12
  • Sure, something like this for a start [Simple Digit Recognition OCR in OpenCV-Python](https://stackoverflow.com/questions/9413216/simple-digit-recognition-ocr-in-opencv-python) – Rick M. Jun 08 '17 at 11:25
  • Okay Thank you for your guidance.please share if you have some documentation regarding the same. – Abhishek Pawar Jun 09 '17 at 07:48

1 Answers1

0

This answer is more general than question specific. I will try to stick as much as possible with the problem statement.

Although there is a lot of on going research on recognition of hand written text, there is no full-proof method, which works with all possible problems.

The sample image you posted here is relatively noisy, with extremely high variance between the font of the same letter. This is exactly where it gets tricky.

I would personally suggest that once you have the bounding boxes around the text (which you already do), run contour extraction in all these bounding boxes in order to extract single letters. Once you have them, you need to figure out relevant feature/s that can represent the maximum variance (or at least 95% Confidence Interval) of the particular letter.

With this/ese feature/s, you need to train a supervised algorithm, letters as training data and their corresponding value (for eg. actual values) as labels. Once you have that, give it some data (the easiest and most difficult cases) to analyze the accuracy.

These links can help you for a start :

  1. One of my first tools to check the accuracy with the set of features I use before I start coding: Weka

  2. Go through basic tutorials on machine learning and how they work - Personal Favorite

  3. You could try TensorFlow.

  4. Simple Digit Recognition OCR in OpenCV-Python - Great for beginners.

Hope it helps!

Rick M.
  • 3,045
  • 1
  • 21
  • 39