14

I'm trying to implement this loss function : MCFD_loss_function from this document (P6) : Loss functions

So I created a new function like this :

def mcfd_loss(y_true, y_pred):
    return K.sum( # ∑
        K.cast(
            K.greater( # only values greater than 0 (+ float32 cast)
                  K.dot(K.sign(y_pred),  # π
                        K.sign(y_true))
           , 0)
        , 'float32')
    )

But when I start training this error is raised :

ValueError: An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

I don't know which point I missed. The error seems to be raised because I use greater function. I don't know what does this error mean and how to correct my problem.

Thanks.

bastien enjalbert
  • 337
  • 1
  • 3
  • 18
  • 1
    Like the error said. Training a NN means calculating the gradients starting from the loss function. So every op used in your loss function has to be differentiable. So simply check which op cause this error. You then can replace the op with an op which is differentiable and approx its behaviour, e.g. a steep tanh for the sign op. – dennis-w Mar 11 '18 at 22:01
  • Thank you for you answer. I understand that in Keras every operation has to be differentiable in loss function. The thing that I don't really understand is how "greater" or "cast" operations can be differentiable. Furthermore sign op is differentiable, except at 0, isn't it ? – bastien enjalbert Mar 14 '18 at 09:04
  • 1
    Well you have to test out what is acceptable for tensorflow and what s not. So maybe tf does not care about zero in sign but maybe it does. You can approximate K.greater function with a sigmoid, which is so steep that looks like a greater function (you also have to move it along the x axis). Then you probably won't need the cast anymore. – dennis-w Mar 14 '18 at 09:25

1 Answers1

4

You want your loss function to check whether sign(f_(t,1))*sign(Y_(t+1)) is greater than 0. Since sign is not differentiable at 0, I would suggest using softsign instead.

Since the greate than function is also not differentiable, one can use the following approximation (see here): maxϵ(x,y):= 0.5(x + y + absϵ(x − y)), where absϵ(x):=sqrt(x^2 + ϵ) and ϵ > 0. For simplicity I will call this approximation in the code example below as greater_approx. (Note, that you just have to insert the calculations above)

Looking at the loss function's definition, you have to divide the sum by the number of predictions (K.get_variable_shape(y_pred)[0]) (and also add a minus). P corresponds to the number of predictions according to Loss Functions in Time Series Forecasting paper.

All in all your loss function should look like this:

def mcfd_loss(y_true, y_pred):
   return - (1/K.get_variable_shape(y_pred)[0]) * K.sum( # ∑
      K.cast(
         greater_approx( # only values greater than 0 (+ float32 cast)
            K.dot(K.softsign(y_pred),  # π
                    K.softsign(y_true))
         , 0)
      , 'float32')
   )

Last remark: for using a custom loss function in Keras checkout this SO question

Simdi
  • 794
  • 4
  • 13