Is there a simple way to extend an existing activation function? My custom softmax function returns: An operation has `None` for gradient

Question

I want to implement an attempt to make softmax faster by using only the top k values in the vector.

For that I tried implementing a custom function for tensorflow to use in a model:

def softmax_top_k(logits, k=10):
    values, indices = tf.nn.top_k(logits, k, sorted=False)
    softmax = tf.nn.softmax(values)
    logits_shape = tf.shape(logits)
    return_value = tf.sparse_to_dense(indices, logits_shape, softmax)
    return_value = tf.convert_to_tensor(return_value, dtype=logits.dtype, name=logits.name)
    return return_value

I'm using the fashion mnist to test, whether that attempt is working:

fashion_mnist = keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

# normalize the data
train_images = train_images / 255.0
test_images = test_images / 255.0

# split the training data into train and validate arrays (will be used later)
train_images, train_images_validate, train_labels, train_labels_validate = train_test_split(
    train_images, train_labels, test_size=0.2, random_state=133742,
)

model = keras.models.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(10, activation=softmax_top_k)
])


model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

model.fit(
    train_images, train_labels,
    epochs=10,
    validation_data=(train_images_validate, train_labels_validate),
)

model_without_cnn.compile(
    loss='sparse_categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

model_without_cnn.fit(
    train_images, train_labels,
    epochs=10,
    validation_data=(train_images_validate, train_labels_validate),
)

But during the execution an error is occuring:

ValueError: An operation hasNonefor gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable).

I've found this: (How to make a custom activation function), which explaines how to implement a completly custom activation function to tensorflow. But since this uses and expands softmax, I thought that the gradient should still be the same.

This is my first week of coding with python and tensorflow, therefore I don't have a good overview over all the internal implementations, yet.

Is there a simpler way to extend softmax into a new function, rather than implementing it from scratch?

Thanks in advance!

jdehesa · Accepted Answer · 2019-03-12T13:49:20.353

Instead of using sparse tensors to make the tensor with "all zeros except softmaxed top-K values", use tf.scatter_nd:

import tensorflow as tf

def softmax_top_k(logits, k=10):
    values, indices = tf.nn.top_k(logits, k, sorted=False)
    softmax = tf.nn.softmax(values)
    logits_shape = tf.shape(logits)
    # Assuming that logits is 2D
    rows = tf.tile(tf.expand_dims(tf.range(logits_shape[0]), 1), [1, k])
    scatter_idx = tf.stack([rows, indices], axis=-1)
    return tf.scatter_nd(scatter_idx, softmax, logits_shape)

EDIT: Here is a slightly more complex version for tensors with an arbitrary number of dimensions. The code still requires that the number of dimensions is known at graph construction time, though.

import tensorflow as tf

def softmax_top_k(logits, k=10):
    values, indices = tf.nn.top_k(logits, k, sorted=False)
    softmax = tf.nn.softmax(values)
    # Make nd indices
    logits_shape = tf.shape(logits)
    dims = [tf.range(logits_shape[i]) for i in range(logits_shape.shape.num_elements() - 1)]
    grid = tf.meshgrid(*dims, tf.range(k), indexing='ij')
    scatter_idx = tf.stack(grid[:-1] + [indices], axis=-1)
    return tf.scatter_nd(scatter_idx, softmax, logits_shape)

Thanks for the fast answer! What would I need to change this, to be able to accept a larger dimensional tensor than 2? Do I need to grab the last Layer of it to pass it into the 2D solution? — JtheB, Mar 12 '19 at 13:41
@JtheB I added a variation for arbitrary number of dimensions. — jdehesa, Mar 12 '19 at 13:49

Is there a simple way to extend an existing activation function? My custom softmax function returns: An operation has `None` for gradient

1 Answers1

Linked