Keras custom activation function (not training)

Question

I am trying to implement my own activation function using functions from the Keras backend or Tensorflow but I have trouble getting this function to learn properly.

My first approach was to rebuild an existing activation function (ELU) to see if the there is a problem with my own activation function, but even the rebuild function does not train like the activation function build into Keras or Tensorflow.

Tensorflow function:

def custom_activation(x):
    cond = tf.greater(x, tf.constant(0.0))
    return tf.where(cond,
                    x,
                    tf.subtract(tf.exp(x), tf.constant(1.0)))

Keras function:

def custom_activation(x):
    cond = K.greater(x, 0)
    return K.switch(cond, x, K.exp(x) - 1)

get_custom_objects().update({'custom_activation': Activation(custom_activation)})

I am using the mnist dataset and a simple 8 layer fully connected network with 128 nodes in each layer to test my activation function. This network is learning slightly with the build-in ELU function, but with the custom Keras or Tensorflow function the loss is instantly near zero and the accuracy doesn't improve at all.

What am I missing?

I followed How do you create a custom activation function with Keras? for the Keras function and this post for Tensorflow.

Full code (for copy / paste):

ELU in Keras (working normal)

from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(x_train.shape[0], 28*28)
x_test = x_test.reshape(x_test.shape[0], 28*28)

from keras.utils import to_categorical

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

from keras import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD

model = Sequential([
    Dense(128, input_shape=x_train.shape[1:]),
    Activation('elu'),
    Dense(128),
    Activation('elu'),
    Dense(128),
    Activation('elu'),
    Dense(128),
    Activation('elu'),
    Dense(128),
    Activation('elu'),
    Dense(128),
    Activation('elu'),
    Dense(128),
    Activation('elu'),
    Dense(128),
    Activation('elu'),
    Dense(10),
    Activation('sigmoid')
])

model.compile(SGD(lr=0.01), loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(x=x_train, y=y_train,
          validation_data=[x_test, y_test],
          batch_size=64, epochs=5)

custom ELU in Keras

from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(x_train.shape[0], 28*28)
x_test = x_test.reshape(x_test.shape[0], 28*28)

from keras.utils import to_categorical

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

from keras import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD
from keras import backend as K
from keras.utils.generic_utils import get_custom_objects

def custom_activation(x):
    cond = K.greater(x, 0)
    return K.switch(cond, x, K.exp(x) - 1)

get_custom_objects().update({'custom_activation': Activation(custom_activation)})

model = Sequential([
    Dense(128, input_shape=x_train.shape[1:]),
    Activation(custom_activation),
    Dense(128),
    Activation(custom_activation),
    Dense(128),
    Activation(custom_activation),
    Dense(128),
    Activation(custom_activation),
    Dense(128),
    Activation(custom_activation),
    Dense(128),
    Activation(custom_activation),
    Dense(128),
    Activation(custom_activation),
    Dense(128),
    Activation(custom_activation),
    Dense(10),
    Activation('sigmoid')
])

model.compile(SGD(lr=0.01), loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(x=x_train, y=y_train,
          validation_data=[x_test, y_test],
          batch_size=64, epochs=5)

custom ELU in Tensorflow with Keras API

from tensorflow.keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(x_train.shape[0], 28*28)
x_test = x_test.reshape(x_test.shape[0], 28*28)

from tensorflow.keras.utils import to_categorical

y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras.optimizers import SGD

def custom_activation(x):
    cond = tf.greater(x, tf.constant(0.0))
    return tf.where(cond,
                    x,
                    tf.subtract(tf.exp(x), tf.constant(1.0)))

model = Sequential([
    Dense(128, input_shape=x_train.shape[1:]),
    Activation(custom_activation),
    Dense(128),
    Activation(custom_activation),
    Dense(128),
    Activation(custom_activation),
    Dense(128),
    Activation(custom_activation),
    Dense(128),
    Activation(custom_activation),
    Dense(128),
    Activation(custom_activation),
    Dense(128),
    Activation(custom_activation),
    Dense(128),
    Activation(custom_activation),
    Dense(10),
    Activation('sigmoid')
])

model.compile(SGD(lr=0.01), loss='categorical_crossentropy', metrics=['accuracy'])

model.fit(x=x_train, y=y_train,
          validation_data=[x_test, y_test],
          batch_size=64, epochs=5)

The elu is defined as: if `x > 0` then `x`, otherwise `exp(x) - 1`. However, you have defined it as `exp(-x) - 1` when `x < 0`. I think that (i.e. negative sign) is the problem. — today, Sep 01 '18 at 13:56
You are right, I will correct this. None the less even then the loss is at zero after the first epoch and there are no improvements in accuracy. (In contrast the "real" ELU function does train.) — phifre, Sep 01 '18 at 17:02

score 2 · Accepted Answer · answered Sep 02 '18 at 11:10

If you print out model.get_weights() in your custom_activation cases, you should see that the weights are all nans. That's why there are no improvements in accuracy.

The reason is that K.exp(x) becomes inf for x > 88 or so (and MNIST dataset contains values from 0 to 255). As a result, there will be a 0 * inf = nan calculation encountered during the gradient propagation through K.switch(). Maybe check this related TF issue for more details. It seems that K.switch() (or equivalently, tf.where()) is not smart enough to figure out the fact that K.exp(x) is required only when x < 0 in the custom activation.

I'm not an expert in TensorFlow, but I guess the reason why the built-in ELU activation (which calls tf.nn.elu) works fine is because tf.nn.elu has its own gradient op. The branching of x >= 0 and x < 0 is handled inside the gradient op instead of multiplying the gradients of tf.exp() and tf.where() ops, so the 0 * inf = nan calculation can be avoided.

To solve the problem, you can either normalize your data before training,

x_train = x_train.reshape(x_train.shape[0], 28*28) / 255.
x_test = x_test.reshape(x_test.shape[0], 28*28) / 255.

or apply ceiling operation to x before taking K.exp() since we don't need to know the actual values of K.exp(x) when x is greater than 0.

def custom_activation(x):
    cond = K.greater(x, 0)
    return K.switch(cond, x, K.exp(K.minimum(x, 0.)) - 1)

Keras custom activation function (not training)

1 Answers1