One of the problems that I am working on is to do OCR on documents. A few of the paystub document have a highlighted line with dots to differentiate important elements like Gross Pay, Net Pay, etc.
These dots give erroneous results in OCR, it considers them as ':' character and doesn't give desired results. I have tried a lot of things for image processing such as ImageMagick, etc to remove these dots. But in each case the quality of entire text data is degraded resulting in poor OCR.
ImageMagick commands that I have tried is:
convert mm150.jpg -kuwahara 3 mm2.jpg
I have also tried connected components, erosion with kernels, etc, but each method fails in some way.
I would like to know if there is some method that I should follow, or am I missing something from Image Processing capabilities.