I recently came across Tesseract and OpenCV. It looks like Tesseract is a full-fledged OCR engine and OpenCV can be used as a framework to create an OCR application/service.
I tried using Tesseract on some of my images and its accuracy seems decent. Later, I came across a very simple tutorial on using OpenCV to perform OCR using Python and was impressed. In a few minutes, I finished training the system and its accuracy was good. But of course, taking this approach means I need to train my system extensively using a large training set.
My specific questions are the following:
- How does one choose between Tesseract and using OpenCV to build a custom OCR app?
- There are training datasets available for Tesseract for different languages. Does OpenCV have something similar so that I don't have to start ground up to achieve OCR?
- Which one is better for a wanna-be commercial application?
Any suggestions?