ValueError: An operation has `None` for gradient. while implementing custom loss function in Keras

Question

I'm trying to implement the following custom loss function from this SO post; however, I've had to make some minor changes to suit my model. For some context, I'm using multi labels with 5 classes (below is an example of how they're encoded).

0 => [1, 0, 0, 0, 0]
1 => [1, 1, 0, 0, 0]
2 => [1, 1, 1, 0, 0]
3 => [1, 1, 1, 1, 0]
4 => [1, 1, 1, 1, 1]

My custom loss function

def _cohen_kappa(y_true, y_pred, num_classes=5, weights=None, metrics_collections=None, updates_collections=None, name=None):
    kappa, update_op = tf.contrib.metrics.cohen_kappa(y_true, y_pred, num_classes, weights, metrics_collections, updates_collections, name)
    kappa = K.cast(kappa, 'float32')
    K.get_session().run(tf.local_variables_initializer())
    with tf.control_dependencies([update_op]):
        kappa = tf.identity(kappa)
    return kappa


def cohen_kappa_loss(num_classes=5, weights=None, metrics_collections=None, updates_collections=None, name=None):
    def cohen_kappa(y_true, y_pred):
        y_true = K.cast(y_true, 'int32')
        y_pred = K.cast(y_pred + 0.5, 'int32')

        y_true = tf.subtract(K.sum(y_true, axis=1), tf.constant(1))
        y_pred = tf.subtract(K.sum(y_pred, axis=1), tf.constant(1))

        return -_cohen_kappa(y_true, y_pred, num_classes, weights, metrics_collections, updates_collections, name)
    return cohen_kappa

This is how I'm attempting to use my loss function:

model_cohen_kappa = cohen_kappa_loss(num_classes=5)
model.compile(loss=model_cohen_kappa,
              optimizer=optimizers.SGD(lr=0.0001, momentum=0.9),
              metrics=['accuracy'])

Unfortunately, I get the following error, which is confusing since my loss function doesn't contain K.argmax, K.round, K.eval., which are mentioned in the error message as operations that are non-differentiable. Is there another non-differentiable operation in my custom loss function that I'm not noticing that is giving me this error?

Traceback (most recent call last):
  File "small_test.py", line 106, in <module>
    main()
  File "small_test.py", line 101, in main
    max_queue_size=2
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\engine\training_generator.py", line 40, in fit_generator
    model._make_train_function()
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\engine\training.py", line 509, in _make_train_function
    loss=self.total_loss)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\optimizers.py", line 184, in get_updates
    grads = self.get_gradients(loss, params)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\optimizers.py", line 91, in get_gradients
    raise ValueError('An operation has `None` for gradient. '
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

While I suspect K.cast is non-differentiable, removing the below snippet from my loss function results in the following error:

kappa = K.cast(kappa, 'float32')

Error

Traceback (most recent call last):
  File "small_test.py", line 106, in <module>
    main()
  File "small_test.py", line 91, in main
    metrics=['accuracy'])
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\engine\training.py", line 342, in compile
    sample_weight, mask)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\engine\training_utils.py", line 421, in weighted
    score_array *= weights
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\tensorflow\python\ops\math_ops.py", line 884, in binary_op_wrapper
    return func(x, y, name=name)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1180, in _mul_dispatch
    return gen_math_ops.mul(x, y, name=name)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 6879, in mul
    "Mul", x=x, y=y, name=name)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 563, in _apply_op_helper
    inferred_from[input_arg.type_attr]))
TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type float64 of argument 'x'.

You cannot compute gradients of integer values. It does not work with any TF operation, and it does not make a lot of sense, mathematically speaking. Flooring or rounding should also give either no gradient or zero gradient. — jdehesa, Jul 29 '19 at 11:22
Forgive me if I'm misunderstanding you but the part where i cast into int32 is to convert my multi-labels into class values i.e. [1, 1, 1, 0, 0] is interpreted as 2. I think that should be ok since the gradient I'm trying to compute is the result of the `_cohen_kappa` function? — ptk, Jul 29 '19 at 11:37
@jdehesa I've just edited my post to hopefully clarify what I'm doing. Feel free to correct me if I've misinterpreted you! — ptk, Jul 29 '19 at 11:42
Mmm, from what I understand, if `y_true` is `[1, 1, 1, 0, 0]` then after casting to int and summing, wouldn't you have `3`? In any case, irrespective of that, any part of the graph with integer values is not differentiable. I'm not familiar with Cohen's kappa, but since it requires integer inputs I assume it can never be differentiable (seeing a bit about the definition supports this, as it is based on the ratio of agreement, which is not differentiable). Maybe a differentiable approximation could be computed but, as it is, it cannot be used as a loss value. — jdehesa, Jul 29 '19 at 11:56
Ahh yes - that's my mistake... I see I see :( would you happen to have any suggestions on a good loss function for multi labels? I was using just binary cross entropy but I thought optimising for cohen kappa (aka quadratic weighted kappa) would yield better results since that is the metric my model is being evaluated against — ptk, Jul 29 '19 at 12:02
Well that is kind of a research question on itself... You can look up different articles on that, for example some people have experimented with using squared loss for multi-label classification (e.g. see [this question in Cross Validated](https://stats.stackexchange.com/q/266783)). You can also use per-class sigmoid cross-entropy, [here](https://stackoverflow.com/q/48198306/1782792) I wrote a bit about "negative" examples... It is very much open for experimentation if you want to dig into that. — jdehesa, Jul 29 '19 at 12:14

ValueError: An operation has `None` for gradient. while implementing custom loss function in Keras

0 Answers0