0

I am trying to create a multi-layered perceptron for the purpose of classifying a dataset of hand drawn digits obtained from the MNIST database. It implements 2 hidden layers that have a sigmoid activation function while the output layer utilizes SoftMax. However, for whatever reason I am not able to get it to work. I have attached the training loop from my code below, this I am confident is where the problems stems from. Can anyone identify possible issues with my implementation of the perceptron?

    def train(self, inputs, targets, eta, niterations):
        """
        inputs is a numpy array of shape (num_train, D) containing the training images
                    consisting of num_train samples each of dimension D.

        targets is a numpy array of shape (num_train, D) containing the training labels
                    consisting of num_train samples each of dimension D.

        eta is the learning rate for optimization 
        niterations is the number of iterations for updating the weights 

        """
        ndata = np.shape(inputs)[0]  # number of data samples
        # adding the bias
        inputs = np.concatenate((inputs, -np.ones((ndata, 1))), axis=1)

        # numpy array to store the update weights
        updatew1 = np.zeros((np.shape(self.weights1)))
        updatew2 = np.zeros((np.shape(self.weights2)))
        updatew3 = np.zeros((np.shape(self.weights3)))

        for n in range(niterations):

            # forward phase
            self.outputs = self.forwardPass(inputs)

            # Error using the sum-of-squares error function
            error = 0.5*np.sum((self.outputs-targets)**2)

            if (np.mod(n, 100) == 0):
                print("Iteration: ", n, " Error: ", error)

            # backward phase
            deltao = self.outputs - targets
            placeholder = np.zeros(np.shape(self.outputs))
            for j in range(np.shape(self.outputs)[1]):
                y = self.outputs[:, j]
                placeholder[:, j] = y * (1 - y)
                for y in range(np.shape(self.outputs)[1]):
                    if not y == j:
                        placeholder[:, j] += -y * self.outputs[:, y]
            deltao *= placeholder
            # compute the derivative of the second hidden layer
            deltah2 = np.dot(deltao, np.transpose(self.weights3))
            deltah2 = self.hidden2*self.beta*(1.0-self.hidden2)*deltah2
            # compute the derivative of the first hidden layer
            deltah1 = np.dot(deltah2[:, :-1], np.transpose(self.weights2))
            deltah1 = self.hidden1*self.beta*(1.0-self.hidden1)*deltah1
            # update the weights of the three layers: self.weights1, self.weights2 and self.weights3
            updatew1 = eta*(np.dot(np.transpose(inputs),deltah1[:, :-1])) + (self.momentum * updatew1)
            updatew2 = eta*(np.dot(np.transpose(self.hidden1),deltah2[:, :-1])) + (self.momentum * updatew2)
            updatew3 = eta*(np.dot(np.transpose(self.hidden2),deltao)) + (self.momentum * updatew3)

            self.weights1 -= updatew1
            self.weights2 -= updatew2
            self.weights3 -= updatew3

    def forwardPass(self, inputs):
        """
            inputs is a numpy array of shape (num_train, D) containing the training images
                    consisting of num_train samples each of dimension D.  
        """
        # layer 1
        # the forward pass on the first hidden layer with the sigmoid function
        self.hidden1 = np.dot(inputs, self.weights1)
        self.hidden1 = 1.0/(1.0+np.exp(-self.beta*self.hidden1))
        self.hidden1 = np.concatenate((self.hidden1, -np.ones((np.shape(self.hidden1)[0], 1))), axis=1)
        # layer 2
        # the forward pass on the second hidden layer with the sigmoid function
        self.hidden2 = np.dot(self.hidden1, self.weights2)
        self.hidden2 = 1.0/(1.0+np.exp(-self.beta*self.hidden2))
        self.hidden2 = np.concatenate((self.hidden2, -np.ones((np.shape(self.hidden2)[0], 1))), axis=1)

        # output layer
        # the forward pass on the output layer with softmax function
        outputs = np.dot(self.hidden2, self.weights3)
        outputs = np.exp(outputs)
        outputs /= np.repeat(np.sum(outputs, axis=1),outputs.shape[1], axis=0).reshape(outputs.shape)
        return outputs

Update: I have since figured something out that I messed up during the backpropagation of the SoftMax algorithm. The actual deltao should be:

            deltao = self.outputs - targets
            placeholder = np.zeros(np.shape(self.outputs))
            for j in range(np.shape(self.outputs)[1]):
                y = self.outputs[:, j]
                placeholder[:, j] = y * (1 - y)
# the counter for the for loop below used to also be named y causing confusion
                for i in range(np.shape(self.outputs)[1]):
                    if not i == j:
                        placeholder[:, j] += -y * self.outputs[:, i]
            deltao *= placeholder

After this correction the overflow errors have seemed to have sorted themselves however, there is now a new problem, no matter my efforts the accuracy of the perceptron does not exceed 15% no matter what variables I change

Second Update: After a long time I have finally found a way to get my code to work. I had to change the backpropogation of SoftMax (in code this is called deltao) to the following:

   deltao = np.exp(self.outputs)
   deltao/=np.repeat(np.sum(deltao,axis=1),deltao.shape[1]).reshape(deltao.shape)
   deltao = deltao * (1 - deltao)
   deltao *= (self.outputs - targets)/np.shape(inputs)[0]

Only problem is I have no idea why this works as a derivative of SoftMax could anyone explain this?

Bad_coder
  • 1
  • 1
  • 1
    Could you also post the error that you're getting? – user6386471 Nov 26 '20 at 14:09
  • An overflow occurs "RuntimeWarning: overflow encountered in exp", this applies to all the sigmoid functions and SoftMax during the forward phase. I was able to get rid of this by lowering the learning rate from 0.1 to 0.00001 but then during validation I get accuracies in the ball park of 11%. I also tried normalizing the exp functions but to no avail. – Bad_coder Nov 27 '20 at 06:15
  • So it seems like the issue is that your input to `np.exp()` is very large, raising the overflow error. This is happening because your weights are very large, and this is usually due to the phenomenon of exploding gradients, however this is more common in deeper NN and shouldn't be happening in your MLP. By reducing the learning rate by several orders of magnitude your weight updates are more modest and therefore the exponentiation goes ahead without any issue. It would be worth checking how the weights get updated in each epoch and see how many epochs pass before it blows up. – user6386471 Nov 28 '20 at 17:44
  • 1
    Just saw your update, and noticed that you are using the MSE loss whereas you should be using something like the categorical cross entropy loss. – user6386471 Nov 30 '20 at 14:32

0 Answers0