2

Fooling around with OCR.

I have a set of binary images of numbers 0-9 that can be used as training data, and another set of unknown numbers within the same range. I want to be able to classify the numbers in the unknown set by using the k nearest neighbour algorithm.

I've done some studying on the algorithm, and I've read that the best approach is to take quantity characteristics and plot each training data in a feature space with those characteristics as the axes, and for each image in the unknown set do the same, and using the k nearest neighbour algorithm find the closest points, something like what is done here.

What characteristics would be best suited to something like this?

identicon
  • 967
  • 10
  • 20
  • 2
    Usually the pixel intensities in your images, as vectors. – phs Mar 04 '14 at 02:55
  • How do we store it as a vector? (I'm assuming here that when you say vector, you mean a [Euclidean Vector.](http://en.wikipedia.org/wiki/Euclidean_vector)) – identicon Mar 04 '14 at 13:04

1 Answers1

1

In a simple case, as phs mentioned in his comment, pixel intensities are used. The images are resized to a standard size like 20x20, 10x10 etc, and express the whole image as a vector of 400 or 100 elements respectively.

Such an example is shown here: Simple Digit Recognition OCR in OpenCV-Python

Or you can look for features like moments, centroid, area, perimeter, euler number etc.

If your image is grayscale, you can for Histogram of Oriented Gradients. Here is an example with SVM. You can try adapting it to the kNN : http://docs.opencv.org/trunk/doc/py_tutorials/py_ml/py_svm/py_svm_opencv/py_svm_opencv.html#svm-opencv

Community
  • 1
  • 1
Abid Rahman K
  • 51,886
  • 31
  • 146
  • 157
  • I've taken a look at OpenCV's k nearest neighbor algorithm and the example given and I am a bit confused. In the example, you just passed the image itself to the k nearest function. How exactly does it predict using just the raw image? – identicon Mar 07 '14 at 00:40
  • I passed the image, but extracted a bounding box around every digit to train which is equivalent to provide those digits as separate files. Then that image is converted to 20x20 px, then made it to a vector of 400 pixels, which is directly given to kNN for training. – Abid Rahman K Mar 07 '14 at 03:48