I am trying to create a multi-layered perceptron for the purpose of classifying a dataset of hand drawn digits obtained from the MNIST database. It implements 2 hidden layers that have a sigmoid activation function while the output layer utilizes SoftMax. However, for whatever reason I am not able to get it to work. I have attached the training loop from my code below, this I am confident is where the problems stems from. Can anyone identify possible issues with my implementation of the perceptron?
def train(self, inputs, targets, eta, niterations):
"""
inputs is a numpy array of shape (num_train, D) containing the training images
consisting of num_train samples each of dimension D.
targets is a numpy array of shape (num_train, D) containing the training labels
consisting of num_train samples each of dimension D.
eta is the learning rate for optimization
niterations is the number of iterations for updating the weights
"""
ndata = np.shape(inputs)[0] # number of data samples
# adding the bias
inputs = np.concatenate((inputs, -np.ones((ndata, 1))), axis=1)
# numpy array to store the update weights
updatew1 = np.zeros((np.shape(self.weights1)))
updatew2 = np.zeros((np.shape(self.weights2)))
updatew3 = np.zeros((np.shape(self.weights3)))
for n in range(niterations):
# forward phase
self.outputs = self.forwardPass(inputs)
# Error using the sum-of-squares error function
error = 0.5*np.sum((self.outputs-targets)**2)
if (np.mod(n, 100) == 0):
print("Iteration: ", n, " Error: ", error)
# backward phase
deltao = self.outputs - targets
placeholder = np.zeros(np.shape(self.outputs))
for j in range(np.shape(self.outputs)[1]):
y = self.outputs[:, j]
placeholder[:, j] = y * (1 - y)
for y in range(np.shape(self.outputs)[1]):
if not y == j:
placeholder[:, j] += -y * self.outputs[:, y]
deltao *= placeholder
# compute the derivative of the second hidden layer
deltah2 = np.dot(deltao, np.transpose(self.weights3))
deltah2 = self.hidden2*self.beta*(1.0-self.hidden2)*deltah2
# compute the derivative of the first hidden layer
deltah1 = np.dot(deltah2[:, :-1], np.transpose(self.weights2))
deltah1 = self.hidden1*self.beta*(1.0-self.hidden1)*deltah1
# update the weights of the three layers: self.weights1, self.weights2 and self.weights3
updatew1 = eta*(np.dot(np.transpose(inputs),deltah1[:, :-1])) + (self.momentum * updatew1)
updatew2 = eta*(np.dot(np.transpose(self.hidden1),deltah2[:, :-1])) + (self.momentum * updatew2)
updatew3 = eta*(np.dot(np.transpose(self.hidden2),deltao)) + (self.momentum * updatew3)
self.weights1 -= updatew1
self.weights2 -= updatew2
self.weights3 -= updatew3
def forwardPass(self, inputs):
"""
inputs is a numpy array of shape (num_train, D) containing the training images
consisting of num_train samples each of dimension D.
"""
# layer 1
# the forward pass on the first hidden layer with the sigmoid function
self.hidden1 = np.dot(inputs, self.weights1)
self.hidden1 = 1.0/(1.0+np.exp(-self.beta*self.hidden1))
self.hidden1 = np.concatenate((self.hidden1, -np.ones((np.shape(self.hidden1)[0], 1))), axis=1)
# layer 2
# the forward pass on the second hidden layer with the sigmoid function
self.hidden2 = np.dot(self.hidden1, self.weights2)
self.hidden2 = 1.0/(1.0+np.exp(-self.beta*self.hidden2))
self.hidden2 = np.concatenate((self.hidden2, -np.ones((np.shape(self.hidden2)[0], 1))), axis=1)
# output layer
# the forward pass on the output layer with softmax function
outputs = np.dot(self.hidden2, self.weights3)
outputs = np.exp(outputs)
outputs /= np.repeat(np.sum(outputs, axis=1),outputs.shape[1], axis=0).reshape(outputs.shape)
return outputs
Update: I have since figured something out that I messed up during the backpropagation of the SoftMax algorithm. The actual deltao should be:
deltao = self.outputs - targets
placeholder = np.zeros(np.shape(self.outputs))
for j in range(np.shape(self.outputs)[1]):
y = self.outputs[:, j]
placeholder[:, j] = y * (1 - y)
# the counter for the for loop below used to also be named y causing confusion
for i in range(np.shape(self.outputs)[1]):
if not i == j:
placeholder[:, j] += -y * self.outputs[:, i]
deltao *= placeholder
After this correction the overflow errors have seemed to have sorted themselves however, there is now a new problem, no matter my efforts the accuracy of the perceptron does not exceed 15% no matter what variables I change
Second Update: After a long time I have finally found a way to get my code to work. I had to change the backpropogation of SoftMax (in code this is called deltao) to the following:
deltao = np.exp(self.outputs)
deltao/=np.repeat(np.sum(deltao,axis=1),deltao.shape[1]).reshape(deltao.shape)
deltao = deltao * (1 - deltao)
deltao *= (self.outputs - targets)/np.shape(inputs)[0]
Only problem is I have no idea why this works as a derivative of SoftMax could anyone explain this?