I am trying to extract character' images from a text line image, so that I can feed these images into my K-Nearest Neighbor classification (I am building my own OCR system).
I have retrieve the text line image, and wonder how should I proceed to extract the characters.
My first attempt is to use horizontal projection to cut the images (from the binary image):
My second attempt is to retrieve the contours for connected components, and tread them as separated characters. This attempt get good results, but for example the letter 'i' cannot be retrieved because of two disconnected contours.
Both these attempt failed when the two characters are too close (or collapsed) on each other.
Do you have any suggestions? I'm trying a way to combine two of them but still unsuccessful.
Note: this is for learning purposes. That's why I don't want to use existing solutions, except using OpenCV for normal image processing. The K-Nearest Neighbor is mandatory, since it's the main part of this OCR system.