How is the smooth dice loss differentiable?

Question

I am training a U-Net in keras by minimizing the dice_loss function that is popularly used for this problem: adapted from here and here

def dsc(y_true, y_pred):
     smooth = 1.
     y_true_f = K.flatten(y_true)
     y_pred_f = K.flatten(y_pred)
     intersection = K.sum(y_true_f * y_pred_f)
     score = (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
     return score

def dice_loss(y_true, y_pred):
    return (1 - dsc(y_true, y_pred))

This implementation is different from the traditional dice loss because it has a smoothing term to make it "differentiable". I just don't understand how adding the smooth term instead of something like 1e-7 in the denominator makes it better because it actually changes the loss values. I have checked this by using a trained unet model on a test set with a regular dice implementation as follows:

def dice(im1,im2):
     im1 = np.asarray(im1).astype(np.bool)
     im2 = np.asarray(im2).astype(np.bool)
     intersection = np.logical_and(im1, im2)
     return np.float(2. * intersection.sum()) / (im1.sum() + im2.sum() + 1e-7))

Can someone explain why the smooth dice loss is conventionally used?

Why do you believe that this `smooth` term makes the loss function differentiable? — zimmerrol, Aug 23 '18 at 01:31
Pretty sure I read it somewhere but I probably mixed up the concepts. — zucchinifries, Aug 23 '18 at 06:04

score 14 · Accepted Answer · edited Apr 26 '22 at 10:22

14

Adding smooth to the loss does not make it differentiable. What makes it differentiable is

Relaxing the threshold on the prediction: You do not cast y_pred to np.bool, but leave it as a continuous value between 0 and 1
You do not use set operations as np.logical_and, but rather use the element-wise product to approximate the non-differenetiable intersection operation.

You only add smooth to avoid division by zero when both y_pred and y_true do not contain any foreground pixels.

edited Apr 26 '22 at 10:22

Innat

16,113
6
53
101

answered Aug 23 '18 at 05:44

Shai

111,146
38
238
371

Thank you! I muddled everything in my head. Just to follow up, if I use a smooth term = 1 for training, should I use that same smooth = 1 for inference? I've originally used a very small value to avoid division by zero but having smooth =1 rather than 1e-7 seems to boost my results for the better – zucchinifries Aug 23 '18 at 06:02
I suppose during inference you need to report the exact dice loss and not the smooth one. @nababs – Shai Aug 23 '18 at 06:18
@Shal,hello,for 'Relaxing the threshold',what do you mean?Do nothing,or clip it? – Alex Luya Sep 17 '20 at 02:49
@AlexLuya you do not threshold at all- thresholding is not differentiable – Shai Sep 17 '20 at 04:13
@Shai,Thanks,I knew that,just be curious to want to know what kinds of 'relaxing threshold' is differentiable – Alex Luya Sep 17 '20 at 06:39

How is the smooth dice loss differentiable?

1 Answers1

Linked