I am having trouble with implementing backprop while using the relu activation function. My model has two hidden layers with 10 nodes in both hidden layers and one node in the output layer (thus 3 weights, 3 biases). My model works other than for this broken broken backward_prop function. However, the function works with backprop using sigmoid activation function (included as comments in function). Thus, I believe I am screwing up the relu derivation.
Can anyone push me in the right direction?
# The derivative of relu function is 1 if z > 0, and 0 if z <= 0
def relu_deriv(z):
z[z > 0] = 1
z[z <= 0] = 0
return z
# Handles a single backward pass through the neural network
def backward_prop(X, y, c, p):
"""
cache (c): includes activations (A) and linear transformations (Z)
params (p): includes weights (W) and biases (b)
"""
m = X.shape[1] # Number of training ex
dZ3 = c['A3'] - y
dW3 = 1/m * np.dot(dZ3,c['A2'].T)
db3 = 1/m * np.sum(dZ3, keepdims=True, axis=1)
dZ2 = np.dot(p['W3'].T, dZ3) * relu_deriv(c['A2']) # sigmoid: replace relu_deriv w/ (1-np.power(c['A2'], 2))
dW2 = 1/m * np.dot(dZ2,c['A1'].T)
db2 = 1/m * np.sum(dZ2, keepdims=True, axis=1)
dZ1 = np.dot(p['W2'].T,dZ2) * relu_deriv(c['A1']) # sigmoid: replace relu_deriv w/ (1-np.power(c['A1'], 2))
dW1 = 1/m * np.dot(dZ1,X.T)
db1 = 1/m * np.sum(dZ1, keepdims=True, axis=1)
grads = {"dW1":dW1,"db1":db1,"dW2":dW2,"db2":db2,"dW3":dW3,"db3":db3}
return grads