3

I am new to machine learning so as a first project I've tried to built a handwritten digit recognition neural network based on the mnist dataset and when I test it with the test images provided by the data set itself it seems to work pretty well (that's what the function test_predict is for). Now I would like to step it up and have the network recognise some actual handwritten digits that I've taken photos of. The function partial_img_rec takes on an image containing multiple digits and it will be called by multiple_digits. I know it might seem weird that I use recursion here and I'm sure there are some more efficient ways to do this but that's not the matter. In order to test partial_img_rec I provide some photos of individual digits that are stored in the folder .\individual_test and they all look something like this:
1_digit.jpg

The problem is: My neural network's prediction for every single one of my test images is "5". The probability is always about 22% no matter the actual digit displayed. I totally get why the results are not as great as those achieved with the mnist dataset's test images but I certainly didn't expect this. Do you have any idea why this is happening? Any advise is welcome. Thank you in advance.

Here's my code (edited, now working):

# import keras and the MNIST dataset
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from keras.utils import np_utils
# numpy is necessary since keras uses numpy arrays
import numpy as np

# imports for pictures
from PIL import Image
from PIL import ImageOps

# imports for tests
import random
import os

class mnist_network():
    def __init__(self):
        """ load data, create and train model """
        # load data
        (X_train, y_train), (X_test, y_test) = mnist.load_data()
        # flatten 28*28 images to a 784 vector for each image
        num_pixels = X_train.shape[1] * X_train.shape[2]
        X_train = X_train.reshape((X_train.shape[0], num_pixels)).astype('float32')
        X_test = X_test.reshape((X_test.shape[0], num_pixels)).astype('float32')
        # normalize inputs from 0-255 to 0-1
        X_train = X_train / 255
        X_test = X_test / 255
        # one hot encode outputs
        y_train = np_utils.to_categorical(y_train)
        y_test = np_utils.to_categorical(y_test)
        num_classes = y_test.shape[1]


        # create model
        self.model = Sequential()
        self.model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer='normal', activation='relu'))
        self.model.add(Dense(num_classes, kernel_initializer='normal', activation='softmax'))
        # Compile model
        self.model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

        # train the model
        self.model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)

        self.train_img = X_train
        self.train_res = y_train
        self.test_img = X_test
        self.test_res = y_test


    def test_all(self):
        """ evaluates the success rate using all the test data """
        scores = self.model.evaluate(self.test_img, self.test_res, verbose=0)
        print("Baseline Error: %.2f%%" % (100-scores[1]*100))

    def predict_result(self, img, num_pixels = None, show=False):
        """ predicts the number in a picture (vector) """
        assert type(img) == np.ndarray and img.shape == (784,)

        """if show:
            # show the picture!!!! some problem here
            plt.imshow(img, cmap='Greys')
            plt.show()"""

        num_pixels = img.shape[0]
        # the actual number
        res_number = np.argmax(self.model.predict(img.reshape(-1,num_pixels)), axis = 1)
        # the probabilities
        res_probabilities = self.model.predict(img.reshape(-1,num_pixels))

        return (res_number[0], res_probabilities.tolist()[0])    # we only need the first element since they only have one

    def test_predict(self, amount_test = 100):
        """ test some random numbers from the test part of the data set """
        assert type(amount_test) == int and amount_test <= 10000
        cnt_right = 0
        cnt_wrong = 0

        for i in range(amount_test):
            ind = random.randrange(0,10000) # there are 10000 images in the test part of the data set
            """ correct_res is the actual result stored in the data set 
                It's represented as a list of 10 elements one of which being 1, the rest 0 """
            correct_list = self.test_res.tolist()
            correct_list = correct_list[ind] # the correct sublist
            correct_res = correct_list.index(1.0)


            predicted_res = self.predict_result(self.test_img[ind])[0]

            if correct_res != predicted_res:
                cnt_wrong += 1
                print("Error in predict ! \
                      index = ", ind, " predicted result = ", predicted_res, " correct result = ", correct_res)
            else:
                cnt_right += 1

        print("The machine predicted correctly ",cnt_right," out of ",amount_test," examples. That is a success rate of ", (cnt_right/amount_test)*100,"%.")

    def partial_img_rec(self, image, upper_left, lower_right, results=[]):
        """ partial is a part of an image """
        left_x, left_y = upper_left
        right_x, right_y = lower_right

        print("current test part: ", upper_left, lower_right)
        print("results: ", results)
        # condition to stop recursion: we've reached the full width of the picture
        width, height = image.size
        if right_x > width:
            return results

        partial = image.crop((left_x, left_y, right_x, right_y))
        # rescale image to 28 *28 dimension
        partial = partial.resize((28,28), Image.ANTIALIAS)

        partial.show()
        # transform to vector
        partial =  ImageOps.invert(partial)
        partial = np.asarray(partial, "float32")
        partial = partial / 255.
        partial[partial < 0.5] = 0.
        # flatten image to 28*28 = 784 vector
        num_pixels = partial.shape[0] * partial.shape[1]
        partial = partial.reshape(num_pixels)

        step = height // 10
        # is there a number in this part of the image? 

        res, prop = self.predict_result(partial)
        print("result: ", res, ". probabilities: ", prop)
        # only count this result if the network is >= 50% sure
        if prop[res] >= 0.5:        
            results.append(res)
            # step is 80% of the partial image's size (which is equivalent to the original image's height) 
            step = int(height * 0.8)
            print("found valid result")
        else:
            # if there is no number found we take smaller steps
            step = height // 20 
        print("step: ", step)
        # recursive call with modified positions ( move on step variables )
        return self.partial_img_rec(image, (left_x+step, left_y), (right_x+step, right_y), results=results)

    def test_individual_digits(self):
        """ test partial_img_rec with some individual digits (square shaped images) 
            saved in the folder 'individual_test' following the pattern 'number_digit.jpg' """
        cnt_right, cnt_wrong = 0,0
        folder_content = os.listdir(".\individual_test")

        for imageName in folder_content:
            # image file must be a jpg or png
            assert imageName[-4:] == ".jpg" or imageName[-4:] == ".png"
            correct_res = int(imageName[0])
            image = Image.open(".\\individual_test\\" + imageName).convert("L")
            # only square images in this test
            if image.size[0]  != image.size[1]:
                print(imageName, " has the wrong proportions: ", image.size,". It has to be a square.")
                continue 
            predicted_res = self.partial_img_rec(image, (0,0), (image.size[0], image.size[1]), results=[])

            if predicted_res == []:
                print("No prediction possible for ", imageName)
            else:
                predicted_res = predicted_res[0]

            if predicted_res != correct_res:
                print("error in partial_img-rec! Predicted ", predicted_res, ". The correct result would have been ", correct_res)
                cnt_wrong += 1
            else:
                cnt_right += 1
                print("correctly predicted ",imageName)
        print(cnt_right, " out of ", cnt_right + cnt_wrong," digits were correctly recognised. The success rate is therefore ", (cnt_right / (cnt_right + cnt_wrong)) * 100," %.")

    def multiple_digits(self, img):
        """ takes as input an image without unnecessary whitespace surrounding the digits """
        #assert type(img) == myImage
        width, height = img.size
        # start with the first quadratic part of the image
        res_list = self.partial_img_rec(img, (0,0),(height ,height))
        res_str =""
        for elem in res_list:
            res_str += str(elem)
        return res_str




network = mnist_network()    
network.test_individual_digits()        

EDIT

@Geecode's answer was very helpful and the network now predicts correctly some of the pictures including the one shown above. Yet the overall success rate is lower than 50%. Do you have any ideas how to improve this?

Examples for images returning bad results:

6 9

Johanna
  • 143
  • 1
  • 10
  • 1
    Your images are just too different from MNIST test images, so the classifier won't work at all with them. The MNIST dataset is not meant for this, its just an academic benchmark. – Dr. Snoopy Dec 28 '19 at 23:49
  • Maybe try creating your own data set and training your model on it... Or you maybe apply morphological filters (using opencv) on your input images to make them resemble the images in the mnist data set... – Sabito stands with Ukraine Dec 29 '19 at 01:22
  • I agree with above statement unless someone has different view. The photos that you took might not be of same resolution, size of whatever. You need more data set or try make your clicked images properties as same as the dataset you have. – Naveen Gabriel Dec 29 '19 at 03:35
  • @MatiasValdenegro This photo with "1" is not so far from MNIST test images. So the model should work with it. – Geeocode Dec 29 '19 at 10:07
  • @Geeocode its a color image, very far from MNIST that is grayscale with black background – Dr. Snoopy Dec 29 '19 at 10:08
  • @MatiasValdenegro true, but if you look at OP's code, you can see, that she made some minor preparation, whis is normal as the MNIST data has been made with some preparation as well before training and in any task this is a first step i.e. image preprocessing as you know for sure as well. – Geeocode Dec 29 '19 at 10:12
  • @NaveenGabriel OP made some preprocessing e.g. regarding sizing, but of course there can be a limit, where the original model doesn't work well, though there can be preprocessing technics with which we can easily adjust our raw images in balk manner to the original model without retraining it. – Geeocode Dec 29 '19 at 10:21
  • lower then 50%, what does it means exactly. about 50%, like 49.5. Or 35% and so – Geeocode Dec 29 '19 at 17:14
  • @Geeocode Well, I filled my folder "individual_test" with some images and I about 35 to 50 % of them are recognised correctly. – Johanna Dec 29 '19 at 20:43

1 Answers1

3

Nothing wrong with your image in itself, your model can correctly classify it.

The issue is that you made a Floor Division on your partial:

partial = partial // 255

which always results in 0. So you always get a black image.

You have to do a "normal" division and some preparation, because your model was trained on black i.e. 0. valued pixel backgrounded negative images:

# transform to vector
partial = ImageOps.invert(partial)
partial = np.asarray(partial, "float32")
partial = partial / 255.
partial[partial < 0.5] = 0.

After then your model will classify correctly:

Out:

result:  1 . probabilities:  [0.000431705528171733, 0.7594985961914062, 0.0011404436081647873, 0.00018972357793245465, 0.03162384033203125, 0.008697531186044216, 0.0014472954208031297, 0.18429973721504211, 0.006838776171207428, 0.005832481198012829]
found valid result

Note, that of course you can play on the image preparation yet, that was not the purpose of this answer.

Update: My detailed answer regarding how to achive better performance in this task, see here.

Geeocode
  • 5,705
  • 3
  • 20
  • 34