I am working on a project that needs accurate OCR results for images with rich background. So I am comparing results of two OCRs (one of them is Tesseract) to make my choice. The point is that results are strongly affected by the pre-processing step and especially image binarization. I extracted the binarized image of the other OCR and passed it to Tesseract which enhanced the results of Tesseract by 30-40%.
I have two questions and your answers would be of much help to me:
- What binarization algorithm does tesseract use, and is it configurable?
- Is there a way to extract the binarized image of Tesseract OCR so I can test the other OCR with it?
Thanks in advance :)