2

I'm trying to build an XOR neural network in python with one hidden layer but I'm hitting a problem with dimensions and I can't figure out why I'm getting the wrong dimensions in the first place because the math looks correct to me.

The dimensions issue starts in the backpropagation part and is commented. The error specifically is

  File "nn.py", line 52, in <module>
    d_a1_d_W1 = inp * deriv_sigmoid(z1) 
  File "/usr/local/lib/python3.7/site-packages/numpy/matrixlib/defmatrix.py", line 220, in __mul__
    return N.dot(self, asmatrix(other))
ValueError: shapes (1,2) and (3,1) not aligned: 2 (dim 1) != 3 (dim 0)

Additionally, why does the sigmoid_derivative function here only work if I cast to a numpy array?

Code:


import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def deriv_sigmoid(x):

  fx = np.array(sigmoid(x)) # gives dimensions issues unless I cast to array
  return fx * (1 - fx)

hiddenNeurons = 3
outputNeurons = 1
inputNeurons = 2

X = np.array( [ [0, 1]  ])
elem = np.matrix(X[0])
elem_row, elem_col = elem.shape


y = np.matrix([1])

W1 = np.random.rand(hiddenNeurons, elem_col)
b1 = np.random.rand(hiddenNeurons, 1)
W2 = np.random.rand(outputNeurons, hiddenNeurons)
b2 = np.random.rand(outputNeurons, 1)
lr = .01



for inp, ytrue in zip(X, y):
    inp = np.matrix(inp)

    # feedforward
    z1 = W1 * inp.T + b1 # get weight matrix1 * inputs + bias1
    a1 = sigmoid(z1) # get activation of hidden layer

    z2 = W2 * a1 + b2 # get weight matrix2 * activated hidden + bias 2
    a2 = sigmoid(z2) # get activated output 
    ypred = a2 # and call it ypred (y prediction)

    # backprop
    d_L_d_ypred = -2 * (ytrue - ypred) # derivative of mean squared error loss

    d_ypred_d_W2 = a1 * deriv_sigmoid(z2) # deriviative of y prediction with respect to weight matrix 2
    d_ypred_d_b2 = deriv_sigmoid(z2) # deriviative of y prediction with respect to bias 2

    d_ypred_d_a1 = W2 * deriv_sigmoid(z2) # deriviative of y prediction with respect to hidden activation

    d_a1_d_W1 = inp * deriv_sigmoid(z1) # dimensions issue starts here ––––––––––––––––––––––––––––––––

    d_a1_d_b1 = deriv_sigmoid(b1) 

    W1 -= lr * d_L_d_ypred * d_ypred_d_a1 * d_a1_d_W1
    b1 -= lr * d_L_d_ypred * d_ypred_d_a1 * d_a1_d_b1
    W2 -= lr * d_L_d_ypred * d_ypred_d_W2
    b2 -= lr * d_L_d_ypred * d_ypred_d_b2


n_1
  • 65
  • 1
  • 9
  • Is the use of numpy matrices strictly necessary? It probably isn't the sole cause of the issue, but the general consensus seems to be that [ndarray is the better choice](https://stackoverflow.com/q/4151128/11301900). The [docs](https://docs.scipy.org/doc/numpy/reference/generated/numpy.matrix.html) state: "It is no longer recommended to use this class, even for linear algebra. Instead use regular arrays. The class may be removed in the future." – AMC Oct 13 '19 at 18:29
  • Thanks. I actually already tried to replace everything with np.array, but still getting the same error. – n_1 Oct 13 '19 at 18:52
  • Alright, i'll try to take a look at the code :) I don't know much about neural networks though, so no promises! – AMC Oct 13 '19 at 19:22

1 Answers1

1

I've never tried to work with neural networks. So I don't fully understand what you are trying to do.

I'd guess there's some confusion as to how a * b works if a & b are matrices, not numpy arrays. On numpy arrays * does an element wise multiplication, on np.matrices it does a matrix multiplication.

a=np.array([[1,2],[3,4]])
b = a-1
print(b) 
# array([[0, 1],
#        [2, 3]])

a*b     # Element wise multiplication
# array([[ 0,  2],     [[ 1*0, 2*1 ], 
#        [ 6, 12]])     [ 3*2, 4*3 ]]

am = np.matrix(a)
bm = np.matrix(b)

am * bm  # Matrix (dot) multiplication
# matrix([[ 4,  7],    [[ 0*1+1*2, 1*1+2*3],
#         [ 8, 15]])    [ 1*2+2*3, 3*1+4*3]]

In the deriv_sigmoid function (without np.array) if x is matrix then fx is a matrix with the same shape (3,1). fx * (1-fx) when fx is a (3,1) matrix raises an exception as two (3,1) matrices can't be multiplied together.

The same issue applies in the '# backprop' part of the code.

d_ypred_d_a1 = W2 * deriv_sigmoid(z2) # deriviative of y prediction with respect to hidden activation
# W2 * deriv_sigmoid(z2) fails as shapes are incompatible with matrix multiplication.    
# deriv_sigmoid(z2) * W2 would work, but I guess would return incorrect values (and shape).

d_a1_d_W1 = inp * deriv_sigmoid(z1)
# This fails for the same reason.  The shapes of ing and z1 are incompatible.

Unless you need matrix multiplication I think using np.arrays will make the programming easier.

Tls Chris
  • 3,564
  • 1
  • 9
  • 24