RELU Backpropagation

Question

I am having trouble with implementing backprop while using the relu activation function. My model has two hidden layers with 10 nodes in both hidden layers and one node in the output layer (thus 3 weights, 3 biases). My model works other than for this broken broken backward_prop function. However, the function works with backprop using sigmoid activation function (included as comments in function). Thus, I believe I am screwing up the relu derivation.

Can anyone push me in the right direction?

# The derivative of relu function is 1 if z > 0, and 0 if z <= 0
def relu_deriv(z):
    z[z > 0] = 1
    z[z <= 0] = 0
    return z

# Handles a single backward pass through the neural network
def backward_prop(X, y, c, p):
    """
    cache (c): includes activations (A) and linear transformations (Z)
    params (p): includes weights (W) and biases (b)
    """
    m = X.shape[1] # Number of training ex
    dZ3 = c['A3'] - y
    dW3 = 1/m * np.dot(dZ3,c['A2'].T)
    db3 = 1/m * np.sum(dZ3, keepdims=True, axis=1)
    dZ2 = np.dot(p['W3'].T, dZ3) * relu_deriv(c['A2']) # sigmoid: replace relu_deriv w/ (1-np.power(c['A2'], 2))
    dW2 = 1/m * np.dot(dZ2,c['A1'].T)
    db2 = 1/m * np.sum(dZ2, keepdims=True, axis=1)
    dZ1 = np.dot(p['W2'].T,dZ2) * relu_deriv(c['A1']) # sigmoid: replace relu_deriv w/ (1-np.power(c['A1'], 2))
    dW1 = 1/m * np.dot(dZ1,X.T)
    db1 = 1/m * np.sum(dZ1, keepdims=True, axis=1)

    grads = {"dW1":dW1,"db1":db1,"dW2":dW2,"db2":db2,"dW3":dW3,"db3":db3}
    return grads

Ugur MULUK · Answer 1 · 2018-10-16T08:44:05.753

1

Is your piece of code throwing an error or do you have a problem with the training? Can you make it clear?

Or in case you deal with binary classification, can you try to make only your output activation function sigmoid and the others ReLU?

Please state specifics.

Edit on reply:

Can you try this one?

 def dReLU(x):
    return 1. * (x > 0)

I refer to: https://gist.github.com/yusugomori/cf7bce19b8e16d57488a

edited Oct 16 '18 at 08:44

answered Oct 15 '18 at 20:59

Ugur MULUK

446
2
8

When I use relu in my forward_prop function and my backward_prop function my model doesn't improve at all. However, if I use the sigmoid activation in my forward_prop and my backward_prop function then my model trains fine. As such I believe that my forward_prop is working fine. I am using binary classification. Within my model, the 2 hidden layers use relu and the output node uses sigmoid. – Nate Oct 16 '18 at 04:06
I edited the answer for you, please try, if this also does not work, that’d mean your back-propagation has some structural problem where the problem only occurs with ReLU. Then we can check for that. – Ugur MULUK Oct 16 '18 at 08:47

RELU Backpropagation

1 Answers1