0

I'm trying to implement the following custom loss function from this SO post; however, I've had to make some minor changes to suit my model. For some context, I'm using multi labels with 5 classes (below is an example of how they're encoded).

0 => [1, 0, 0, 0, 0]
1 => [1, 1, 0, 0, 0]
2 => [1, 1, 1, 0, 0]
3 => [1, 1, 1, 1, 0]
4 => [1, 1, 1, 1, 1]

My custom loss function

def _cohen_kappa(y_true, y_pred, num_classes=5, weights=None, metrics_collections=None, updates_collections=None, name=None):
    kappa, update_op = tf.contrib.metrics.cohen_kappa(y_true, y_pred, num_classes, weights, metrics_collections, updates_collections, name)
    kappa = K.cast(kappa, 'float32')
    K.get_session().run(tf.local_variables_initializer())
    with tf.control_dependencies([update_op]):
        kappa = tf.identity(kappa)
    return kappa


def cohen_kappa_loss(num_classes=5, weights=None, metrics_collections=None, updates_collections=None, name=None):
    def cohen_kappa(y_true, y_pred):
        y_true = K.cast(y_true, 'int32')
        y_pred = K.cast(y_pred + 0.5, 'int32')

        y_true = tf.subtract(K.sum(y_true, axis=1), tf.constant(1))
        y_pred = tf.subtract(K.sum(y_pred, axis=1), tf.constant(1))

        return -_cohen_kappa(y_true, y_pred, num_classes, weights, metrics_collections, updates_collections, name)
    return cohen_kappa

This is how I'm attempting to use my loss function:

model_cohen_kappa = cohen_kappa_loss(num_classes=5)
model.compile(loss=model_cohen_kappa,
              optimizer=optimizers.SGD(lr=0.0001, momentum=0.9),
              metrics=['accuracy'])

Unfortunately, I get the following error, which is confusing since my loss function doesn't contain K.argmax, K.round, K.eval., which are mentioned in the error message as operations that are non-differentiable. Is there another non-differentiable operation in my custom loss function that I'm not noticing that is giving me this error?

Traceback (most recent call last):
  File "small_test.py", line 106, in <module>
    main()
  File "small_test.py", line 101, in main
    max_queue_size=2
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\engine\training_generator.py", line 40, in fit_generator
    model._make_train_function()
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\engine\training.py", line 509, in _make_train_function
    loss=self.total_loss)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\optimizers.py", line 184, in get_updates
    grads = self.get_gradients(loss, params)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\optimizers.py", line 91, in get_gradients
    raise ValueError('An operation has `None` for gradient. '
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

While I suspect K.cast is non-differentiable, removing the below snippet from my loss function results in the following error:

kappa = K.cast(kappa, 'float32')

Error

Traceback (most recent call last):
  File "small_test.py", line 106, in <module>
    main()
  File "small_test.py", line 91, in main
    metrics=['accuracy'])
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\engine\training.py", line 342, in compile
    sample_weight, mask)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\keras\engine\training_utils.py", line 421, in weighted
    score_array *= weights
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\tensorflow\python\ops\math_ops.py", line 884, in binary_op_wrapper
    return func(x, y, name=name)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1180, in _mul_dispatch
    return gen_math_ops.mul(x, y, name=name)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 6879, in mul
    "Mul", x=x, y=y, name=name)
  File "C:\Users\Anaconda3\envs\tensor\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 563, in _apply_op_helper
    inferred_from[input_arg.type_attr]))
TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type float64 of argument 'x'.
ptk
  • 6,835
  • 14
  • 45
  • 91
  • You cannot compute gradients of integer values. It does not work with any TF operation, and it does not make a lot of sense, mathematically speaking. Flooring or rounding should also give either no gradient or zero gradient. – jdehesa Jul 29 '19 at 11:22
  • Forgive me if I'm misunderstanding you but the part where i cast into int32 is to convert my multi-labels into class values i.e. [1, 1, 1, 0, 0] is interpreted as 2. I think that should be ok since the gradient I'm trying to compute is the result of the `_cohen_kappa` function? – ptk Jul 29 '19 at 11:37
  • @jdehesa I've just edited my post to hopefully clarify what I'm doing. Feel free to correct me if I've misinterpreted you! – ptk Jul 29 '19 at 11:42
  • Mmm, from what I understand, if `y_true` is `[1, 1, 1, 0, 0]` then after casting to int and summing, wouldn't you have `3`? In any case, irrespective of that, any part of the graph with integer values is not differentiable. I'm not familiar with Cohen's kappa, but since it requires integer inputs I assume it can never be differentiable (seeing a bit about the definition supports this, as it is based on the ratio of agreement, which is not differentiable). Maybe a differentiable approximation could be computed but, as it is, it cannot be used as a loss value. – jdehesa Jul 29 '19 at 11:56
  • Ahh yes - that's my mistake... I see I see :( would you happen to have any suggestions on a good loss function for multi labels? I was using just binary cross entropy but I thought optimising for cohen kappa (aka quadratic weighted kappa) would yield better results since that is the metric my model is being evaluated against – ptk Jul 29 '19 at 12:02
  • 1
    Well that is kind of a research question on itself... You can look up different articles on that, for example some people have experimented with using squared loss for multi-label classification (e.g. see [this question in Cross Validated](https://stats.stackexchange.com/q/266783)). You can also use per-class sigmoid cross-entropy, [here](https://stackoverflow.com/q/48198306/1782792) I wrote a bit about "negative" examples... It is very much open for experimentation if you want to dig into that. – jdehesa Jul 29 '19 at 12:14

0 Answers0