Why does a custom activation function cause network both zero loss and low accuracy?

Question

I was trying to build a custom activation function using tflearn by making following changes:

add my custom activation function to activation.py

def my_activation(x):
    return tf.where(x >= 0.0, tf.div( x**2 , x + tf.constant(0.6)) , 0.01*x)

and add it to the __init__.py

from .activations import linear, tanh, sigmoid, softmax, softplus, softsign,\
relu, relu6, leaky_relu, prelu, elu, crelu, selu, my_activation

Since tensorflow can perform the gradient calculation automatically, I don't need to implement the gradiate function. As pointed in the article Deep Learning Programming Style,

In the past, whenever someone defined a new model, they had to work out the derivative calculations by hand. While the math is reasonably straightforward, for complex models, it can be time-consuming and tedious work. All modern deep learning libraries make the practitioner/researcher’s job much easier, by automatically solving the problem of gradient calculation.

I trained the model on cifar10 dataset using this code: https://github.com/tflearn/tflearn/blob/master/examples/images/convnet_cifar10.py but changed all relu activations to my_activation.

Sadly, this simple modification cause the network fail to learn anything:

Training Step: 46  | total loss: 0.00002 | time: 1.434s
| Adam | epoch: 001 | loss: 0.00002 - acc: 0.0885 -- iter: 04416/50000
Training Step: 47  | total loss: 0.00002 | time: 1.448s
| Adam | epoch: 001 | loss: 0.00002 - acc: 0.0945 -- iter: 04512/50000
Training Step: 48  | total loss: 0.00001 | time: 1.462s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0927 -- iter: 04608/50000
Training Step: 49  | total loss: 0.00001 | time: 1.476s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0896 -- iter: 04704/50000
Training Step: 50  | total loss: 0.00001 | time: 1.491s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0919 -- iter: 04800/50000
Training Step: 51  | total loss: 0.00001 | time: 1.504s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0890 -- iter: 04896/50000
Training Step: 52  | total loss: 0.00001 | time: 1.518s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0944 -- iter: 04992/50000
Training Step: 53  | total loss: 0.00001 | time: 1.539s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0989 -- iter: 05088/50000
Training Step: 54  | total loss: 0.00001 | time: 1.553s
| Adam | epoch: 001 | loss: 0.00001 - acc: 0.0951 -- iter: 05184/50000
Training Step: 55  | total loss: 0.00000 | time: 1.567s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0964 -- iter: 05280/50000
Training Step: 56  | total loss: 0.00000 | time: 1.580s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0931 -- iter: 05376/50000
Training Step: 57  | total loss: 0.00000 | time: 1.594s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0903 -- iter: 05472/50000
Training Step: 58  | total loss: 0.00000 | time: 1.613s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0851 -- iter: 05568/50000
Training Step: 59  | total loss: 0.00000 | time: 1.641s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0835 -- iter: 05664/50000
Training Step: 60  | total loss: 0.00000 | time: 1.674s
| Adam | epoch: 001 | loss: 0.00000 - acc: 0.0834 -- iter: 05760/50000

Since I am just a beginner, I don't know the reason causing the network became both zero loss and low accuracy (NaN output? Deadweight?). Can anybody tell me how to fix this? thanks!

Please Note that I'm not asking how to build a custom activation function. Questions about how to make a custom function:

Possible duplicate of [How to make a custom activation function with only Python in Tensorflow?](https://stackoverflow.com/questions/39921607/how-to-make-a-custom-activation-function-with-only-python-in-tensorflow) — Sam Hartman, Oct 14 '17 at 18:52

Maxim · Answer 1 · 2017-10-14T18:39:53.167

Why does a custom activation function cause network both zero loss and low accuracy?

Because this network doesn't backpropagate through your new activation. What you did is just a beginning in creating a custom activation function. See this question: "... As explained in the sources mentioned above, there is a hack to define gradients of a function using tf.RegisterGradient and tf.Graph.gradient_override_map...".

I'm actually not sure that your activation is much better than tflearn.activations.leaky_relu, but if you really want to provide a custom activation, you'll have to code the gradient and register it like described above.

Why does a custom activation function cause network both zero loss and low accuracy?

1 Answers1