Keras custom loss function error "No gradients provided"

Question

Problem Description

I am trying to train a network with Keras based on TensorFlow 2.3.0. The task is to create new pictures. In a first simple prototype / proof of concept I am trying to train the network to create pictures just with a given amount of non-black pixel. Therefore I need to define a custom loss function. Doing so I get the ValueError: No gradients provided for any variable which I have not yet been able to solve.

On top I would prefer a way to code this loss function without having to run eagerly (see my previous question).

Code snippet

def custom_loss(y_true, y_pred):
    ndarray = y_pred.numpy()
    mse = np.zeros(ndarray.shape[0])
    for i in range(ndarray.shape[0]):
        true_area = int(y_true[i][0] * 100000).numpy()
        pic = ndarray[i, :, :, :]
        img_np = (pic * 255).astype(np.uint8)
        img = tf.keras.preprocessing.image.array_to_img(img_np)
        count_area = count_nonblack_pil(img)
        mse[i] = ((count_area - true_area) / 100000)**2
        #img.save(f"custom_loss {i:03d} True {true_area:06d} Count {count_area:06d} MSE {mse:0.4f}.jpg")
    return mse

if __name__ == '__main__':
    tf.config.run_functions_eagerly(True)
    ...
    model.compile(loss=custom_loss, optimizer="adam", run_eagerly=True)
    model.fit(x=train_data, y=train_data, batch_size=16, epochs=10)

Running this code gives me the error message:

ValueError: No gradients provided for any variable: ['dense/kernel:0', 'dense/bias:0', 'conv2d/kernel:0', 'conv2d/bias:0', 'conv2d_1/kernel:0', 'conv2d_1/bias:0', 'conv2d_2/kernel:0', 'conv2d_2/bias:0', 'conv2d_3/kernel:0', 'conv2d_3/bias:0'].

What I have tried so far

The error sounds like the loss function is not differentiable, but why shouldn't it be?

Googling for a solution I found the suggestion, that I might have missed to pass the labels, same here, but I already checked this by saving some pics with labels (see line commented out in the code above). This works just fine!

Other than that I was not able to find any useful hint, all in all not too many google hits anyway ... (seems to be exotic what I am trying to do?). Any thoughts?

Edit

Thank you for your quick feedback and sorry for not describing the task of the loss function very clearly, let me give it another try:

I have a model that creates a full 533x800 RGB picture based on a single float input, which is passed on to the loss function as y_true. The picture created by the model is also passed on to the loss function as y_pred. The loss function now calls a small function count_nonblack_pil to count the number of non-black pixels in y_pred. The loss is then calculated as the squared difference between y_true and the counted pixels. By minimizing this difference I expect to train the model so that is able to create a picture with a number of non-black pixels close to the input value. Not really useful, but a simple proof of concept of what I plan to do later with different loss function (where I want to use other already trained models to calculate the loss for more useful and sophisticated tasks).

Hope that makes sense. To make it more clear:

y_true size : 16
y_pred size : 20467200

y_pred contains 16 pictures of 533x800 with 3 colors, i.e. 20467200. y_true contains just the 16 target values of pixels.

Edit: Solution

I have now understood the problem, nicely summarized by JimBiardCics: "Keep in mind that the python function you write (custom_loss) is called to generate and compile a C function. The compiled function is what is called during training. When your python custom_loss function is called, the arguments are tensor objects that don't have data attached to them. The K.eval call will fail, as will the K.shape call."

Can you explain both conceptually and in more precise mathematical terms what the loss fct is supposed to do? Your code is very unclear. This reads like you're trying to get *less* black pixels in the pred. If your overall goal is to reduce the *amount* of black pixels in the pred, a more traditional and straight-forward approach would be to just count the number of black pixels in the prediction and add them as a weighted penalty to the original loss (which I assume was MSE): `Loss = alpha * MSE + beta * sum(black_pixels_in_pred)` with `alpha`, `beta` adjusting the influence — runDOSrun, Sep 14 '20 at 07:34
Thanks for the edit but maybe you can address/comment on the posted answer directly at this point. From my understanding, my solution addresses your problem correctly. If not, let us know why. — runDOSrun, Sep 17 '20 at 10:09

score 2 · Answer 1 · answered Sep 13 '20 at 20:28

2

Custom loss should only use tensorflow operations. Tensorflow can't (yet) calculate gradients on operations from numpy or any other library. You will have to change all calculations to some tf.op functions.

Just as a note, eager execution doesn't help with this problem.

answered Sep 13 '20 at 20:28

tornikeo

915
5
20

2

Note that even if this were to use TF functions only, it would likely not be differentiable if "hard" counts are used -- if a pixel counts as either on or off, the gradient is zero almost everywhere and undefined at the threshold. – xdurch0 Sep 13 '20 at 22:24

score 0 · Answer 2 · answered Sep 14 '20 at 08:13

The error is thrown because these operations are not part of the graph and can therefore not be differentiated. What you're trying to do doesn't require Eager but a sequence of tensorflow methods doing what you want.

Since the exact specifics of your algorithm are a bit fuzzy, I'll propose a somewhat simpler solution. It seems that your overall goal is to generate similar images but reduce the amount of black images compared to the original. You can do this by retaining the original loss but adding a penalty. I'll assume you need the MSE loss but it doesn't matter as you can use any other:

Loss = alpha * MSE + beta * nr_of_black_pixels_in_pred

with alpha, beta adjusting the influence for each. This can be achieved in a loss like this:

def custom_loss(y_true, y_pred):
    alpha, beta = 0.8, 0.2 # percentage influence
    mse = tf.keras.losses.mean_squared_error(y_true, y_pred)
    count = tf.where(y_pred==0., tf.ones_like(y_pred), tf.zeros_like(y_pred))
    bp = tf.math.reduce_sum(count)
    return alpha * mse + beta * bp

The benefit is that you can now even e.g. say y_pred<50. if you want to include "blackish" pixel values.

Why does this work and why is this a better solution? If we only penalized black pixels, the network could potentially just generate white images to get the best possible loss (or set all pixels with value 0 to 1). None of these "cheating" solutions are probably desirable. So we need to keep the original loss to retain the original behavior and a penalty to modify it.

The additional new penalty now automatically reduces how often the boolean condition in tf.where is true. Since this number and the loss are on completely different scales, you might have to additionally normalize the penalty. alpha and beta will also have to be patiently and empirically optimized. These kinds of parameters have a very small range of values where they work properly, which you need to find. For this, I would recommend to add a custom metric to print out how much % of the total loss is caused by the penalty. Due to the scale differences, it might be necessary to put beta as a very small number (but this is highly application-specific).