Non differentiable loss function keras

Question

I'm currently trying to train a image segmentation model with keras. I want my model to return a mask (image with only 0s and 1s) to apply to the input image to get only the interesting part. When I train my model using mse loss, it returns masks with values significantly lower than 1, even though it seems to converge. So I implemented a custom loss function

def loss(y_true, y_pred):   
  tresholded_pred = tf.where(y_pred >= 0.5, 1.0, 0.0)
  sq_diff = tf.square(y_true - tresholded_pred)
  return tf.reduce_mean(sq_diff, axis=-1)

However I've got the following error:

ValueError: No gradients provided for any variable

I assume this is because of the non-differentiability of my function. How can I achieve what I want without having such errors ?

I've also tried to implement the tresholding with a lambda layer, and it raised the exact same error. I've been through a lot of similar topics, but the solutions aren't satisfying so far.

The answer is simply no, you cannot use this as a loss as its not differentiable. Other tricks would be just plain math. — Dr. Snoopy, Apr 06 '23 at 19:43

David · Answer 1 · 2023-04-06T21:46:49.077

Your problem is that tf.where doesn't provide gradients (well, in this situation anyway, since 1.0 and 0.0 don't have gradients).

However, you're misunderstanding a few things about neural networks:

Your output is (and should be) continuous for exactly this reason. While training your model you want to know how far the output is from where you want it, not just that it is wrong. If you know how far away it is, you can slowly step towards it until all the values you want to be 1 are very close to 1, and all the values you want to be 0 are very close to zero. They'll (almost) never be exactly zero. You can read more about this here.
While you shouldn't simply round your values to 0 or 1 while training your model, you should coax them to those values using something like a sigmoid activation function applied to your output. This function maps most negative numbers to 0 and most positive values to 1 and has a continuous transition between them.
While you shouldn't round your values to 0 or 1 in your loss function while training, you can round the output of the model during prediction. This will give you the pure segmentation map you can then use as needed.

`tf.round()` is not differentiable. – Frightera Apr 06 '23 at 21:17 — Frightera, Apr 06 '23 at 21:17
Thanks, removed the reference. – David Apr 06 '23 at 21:47 — David, Apr 06 '23 at 21:47

score 0 · Answer 2 · answered Apr 06 '23 at 19:32

I assume this is because of the non-differentiability of my function. How can I achieve what I want without having such errors ?

You cannot. Neural networks are (most of the time) trained with gradient based methods (e.g. backpropagation). The function you defined has 0 gradients, and thus can't be used. That's it.

That being said I believe you are starting with the wrong assumption. The fact that you are effectively looking to classify things binarily does not mean your loss has to do this (your mask is nothing but a multi-label classification problem, each "pixel" of this mask is a binary classification of its own). In particular typical binary classification will not binarise predictions during learning, you only do this during inference.

What you are looking for is the standard SigmoidCrossEntropy. Then during prediction you just threshold at 0.5.

Non differentiable loss function keras

2 Answers2