I'm looking to automate data entry from predefined forms that have been filled out by hand. The characters are not separated, but the fields are identifiable by lines underneath or as a part of a table. I know that handwriting OCR is still an area of active research, and I can include an operator review function, so I do not expect accuracy above 90%.
The first solution that I thought of is a combination of OpenCV for field identification (http://answers.opencv.org/question/63847/how-to-extract-tables-from-an-image/) and Tesseract to recognize the handwriting (https://github.com/openpaperwork/pyocr).
Another potentially simpler and more efficacious method for field identification with a predefined form would be to somehow subtract the blank form from the filled form. Since the forms would be scanned, this would likely require some location tolerance, noise reduction, and feature recognition.
Any suggestions or comments would be greatly appreciated.