2

I just wrote this very simple handwritten digit recoginition. Here is 8kb archive with the following code + ten .PNG image files. It works: enter image description here is well recognized as enter image description here.

In short, each digit of the database (50x50 pixels = 250 coefficients) is summarized into a 10-coefficient-vector (by keeping the 10 biggest singular values, see Low-rank approximation with SVD).

Then for the digit to be recognized, we minimize the distance with the digits in the database.

from scipy import misc
import numpy as np
import matplotlib.pyplot as plt

digits = []
for i in range(11):
    M = misc.imread(str(i) + '.png', flatten=True)
    U, s, V = np.linalg.svd(M, full_matrices=False)
    s[10:] = 0        # keep the 10 biggest singular values only, discard others
    S = np.diag(s)
    M_reduced = np.dot(U, np.dot(S, V))      # reconstitution of image with 10 biggest singular values
    digits.append({'original': M, 'singular': s[:10], 'reduced': M_reduced})

# each 50x50 pixels digit is summarized into a vector of 10 coefficients : the 10 biggest singular values s[:10]    

# 0.png to 9.png = all the digits (for machine training)
# 10.png = the digit to be recognized
toberecognizeddigit = digits[10]    
digits = digits[:10]

# we find the nearest-neighbour by minimizing the distance between singular values of toberecoginzeddigit and all the digits in database
recognizeddigit = min(digits[:10], key=lambda d: sum((d['singular']-toberecognizeddigit['singular'])**2))    

plt.imshow(toberecognizeddigit['reduced'], interpolation='nearest', cmap=plt.cm.Greys_r)
plt.show()
plt.imshow(recognizeddigit['reduced'], interpolation='nearest', cmap=plt.cm.Greys_r)
plt.show()

Question:

The code works (you can run the code in the ZIP archive), but how can we improve it to have better results? (mostly math techniques I imagine).

For example in my tests, 9 and 3 are sometimes confused with each other.

Basj
  • 41,386
  • 99
  • 383
  • 673
  • I got more or less success with distorted and very noissy numbers (not hand drawed) with this [OCR and character similarity](http://stackoverflow.com/a/22879053/2521214) you can give it a try ... You can create reference font with more versions per number to ease up the recognition – Spektre Dec 07 '15 at 09:57

1 Answers1

4

Digit recognition can be a quite difficult area. Especially when the digits are written in very different or unclear ways. A lot of approaches have been taken in an attempt to solve this problem, and entire competions are dedicated to this subject. For an example, see Kaggle's digit recognizer competition. This competition is based on the well known MNIST data set. In the forums that are there, you will find a lot of ideas and approaches to this problem, but I will give some quick suggestions.

A lot of people approach this problem as a classification problem. Possible algorithms to solve such problems include, for example, kNN, neural networks, or gradient boosting.

However, generally just the algorithm is not enough to get optimal classification rates. Another important aspect to improve your scores is feature extraction. The idea is to calculate features that make it possible to distinguish between different numbers. Some example features for this dataset might include the number of colored pixels, or maybe the width and the height of the digits.

Although the other algorithms might not be what you are looking for, it is possible that adding more features can improve the performance of the algorithm you are currently using as well.

Patrick Kostjens
  • 5,065
  • 6
  • 29
  • 46
  • Thanks for these links. I have looked a bit at it. On what data do they do the K-nearest-neighbours algorithm in MNIST? On a vector of size 28*28 = 784 coefficients for each digit? Or do they keep lower dimensional data like in my code (using SVD+low rank approximation / PCA) ? – Basj Dec 06 '15 at 13:09
  • @Basj, you can indeed do kNN using 784 coefficients (the same holds for the other algorithms). You can also choose to run it just on the features you calculated, or both your feautres and the pixels if you want to. There can be big differences in the performance, but all combinations are possible. – Patrick Kostjens Dec 06 '15 at 21:05
  • Ok so you mean that dimensionality reduction via SVD / PCA (784 --> 40 or 10 coefficients) is not a "necessary" step in these recognition methods? Is it often used or not? – Basj Dec 06 '15 at 21:08
  • It is not strictly necessary, but it can improve both classification performance and execution times if you do so. If you have enough data, the high number of dimensions is not necessarily a problem, however if you have more dimensions than data, your results generally become useless. In the case of the MNIST data, enough data is available to use 784 dimensions without getting very bad classifications, but this is not the case for every dataset. So, no it is not necessary, but dimension reduction is probably used a lot (in case of MNIST, there are a lot of empty dimensions). – Patrick Kostjens Dec 06 '15 at 21:14