18

I am trying to implement sample- and pixel-dependent dependent loss weighting in tf.Keras (TensorFlow 2.0.0rc0) for a 3-D U-Net with sparse annotation data (Cicek 2016, arxiv:1606.06650).

This is my code:

import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, losses, models

# disabling eager execution makes this example work:
# tf.python.framework_ops.disable_eager_execution()


def get_loss_fcn(w):
    def loss_fcn(y_true, y_pred):
        loss = w * losses.mse(y_true, y_pred)
        return loss
    return loss_fcn


data_x = np.random.rand(5, 4, 1)
data_w = np.random.rand(5, 4)
data_y = np.random.rand(5, 4, 1)

x = layers.Input([4, 1])
w = layers.Input([4])
y = layers.Activation('tanh')(x)
model = models.Model(inputs=[x, w], outputs=y)
loss = get_loss_fcn(model.input[1])

# using another loss makes it work, too:
# loss = 'mse'

model.compile(loss=loss)
model.fit((data_x, data_w), data_y)

print('Done.')

This runs fine when disabling eager execution, but one of the points of TensorFlow 2 is to have eager execution by default. What stands between me and that goal is the custom loss function, as you can see (using 'mse' as a loss removes that error, too):

  File "MWE.py", line 30, in <module>
    model.fit((data_x, data_w), data_y)
[...]
tensorflow.python.eager.core._SymbolicException: Inputs to eager execution function cannot be Keras symbolic tensors, but found [<tf.Tensor 'input_2:0' shape=(None, 4) dtype=float32>]

What can I do to make this kind of structure work with eager execution?

One idea that I had was to concatenate w to the output y and separate y_pred into the original y_pred and w in the loss function, but this is a hack I'd like to avoid. It works, though, with changes marked by # HERE:

import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, losses, models


# HERE
def loss_fcn(y_true, y_pred):
    w = y_pred[:, :, -1]  # HERE
    y_pred = y_pred[:, :, :-1]  # HERE
    loss = w * losses.mse(y_true, y_pred)
    return loss


data_x = np.random.rand(5, 4, 1)
data_w = np.random.rand(5, 4, 1)  # HERE
data_y = np.random.rand(5, 4, 1)

x = layers.Input([4, 1])
w = layers.Input([4, 1])  # HERE
y = layers.Activation('tanh')(x)
output = layers.Concatenate()([y, w])  # HERE
model = models.Model(inputs=[x, w], outputs=output)  # HERE
loss = loss_fcn  # HERE

model.compile(loss=loss)
model.fit((data_x, data_w), data_y)

print('Done.')

Any other ideas?

bers
  • 4,817
  • 2
  • 40
  • 59

3 Answers3

9

One alternative solution is to pass weights as additional output features rather than input features.

This keeps the model completely free of anything weights related, and the weights appear only in the loss function and the .fit() call:

import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, losses, models

data_x = 2 * np.ones((7, 11, 15, 3), dtype=float)
data_y = 5 * np.ones((7, 9, 13, 5), dtype=float)

x = layers.Input(data_x.shape[1:])
y = layers.Conv2D(5, kernel_size=3)(x)
model = models.Model(inputs=x, outputs=y)


def loss(y_true, y_pred):
    (y_true, w) = tf.split(y_true, num_or_size_splits=[-1, 1], axis=-1)
    loss = tf.squeeze(w, axis=-1) * losses.mse(y_true, y_pred)

    tf.print(tf.math.reduce_mean(y_true), "== 5")
    tf.print(tf.math.reduce_mean(w), "== 3")

    return loss


model.compile(loss=loss)

data_w = 3 * np.ones((7, 9, 13, 1), dtype=float)
data_yw = np.concatenate((data_y, data_w), axis=-1)
model.fit(data_x, data_yw)

One drawback still is that you need to manipulate (potentially) large arrays when merging y and w in numpy.stack(), so anymore more TensorFlow-like will be appreciated.

bers
  • 4,817
  • 2
  • 40
  • 59
  • 1
    Have you tried training directly without using .fit() like this example: https://www.tensorflow.org/beta/guide/keras/custom_layers_and_models#putting_it_all_together_an_end-to-end_example – vgoklani Sep 01 '19 at 15:35
  • 1
    @vgoklani no, not yet. Thanks for the hint! – bers Sep 01 '19 at 19:31
  • 2
    Problem with that is that you lose a lot of the convenience of keras (i.e. callbacks). – Luke Sep 04 '19 at 21:19
6

Another way:

from tensorflow.keras import layers, models, losses
import numpy as np

def loss_fcn(y_true, y_pred, w):
    loss = w * losses.mse(y_true, y_pred)
    return loss


data_x = np.random.rand(5, 4, 1)
data_w = np.random.rand(5, 4)
data_y = np.random.rand(5, 4, 1)

x = layers.Input([4, 1])
y_true = layers.Input([4, 1])
w = layers.Input([4])
y = layers.Activation('tanh')(x)


model = models.Model(inputs=[x, y_true, w], outputs=y)
model.add_loss(loss_fcn(y, y_true, w))


model.compile()
model.fit((data_x, data_y, data_w))

I think this is the most elegant solution.

feature_engineer
  • 1,088
  • 8
  • 16
  • Could you please explain why this works? I have an idea, but I am not sure I understand the concept behind it. Are you suggesting that `y_true` was the tensor that my MWE had a problem with, and not `w`? Because I don't see any change in `w`. – bers Jan 11 '20 at 15:12
  • 1
    @bers I think the problem was with your function returning an inner function using the symbolic tensor. using a function which returns a symbolic tensor, instead of a function works, since that returned symbolic tensor is lazily evaluated. – feature_engineer Jan 13 '20 at 09:28
  • @feature_engineer This makes sense but if I do this i am getting `ValueError: No gradients provided for any variable: [...]` - any idea why this might happen? – Stefan Falk Jul 02 '20 at 11:05
4

Your code works just fine with latest tensorflow (2.3) if you replace your fit row with

model.fit((data_x, data_y, data_w))

So:

import numpy as np
import tensorflow as tf
from tensorflow.keras import layers, losses, models


# HERE
def loss_fcn(y_true, y_pred):
    w = y_pred[:, :, -1]  # HERE
    y_pred = y_pred[:, :, :-1]  # HERE
    loss = w * losses.mse(y_true, y_pred)
    return loss


data_x = np.random.rand(5, 4, 1)
data_w = np.random.rand(5, 4, 1)  # HERE
data_y = np.random.rand(5, 4, 1)

x = layers.Input([4, 1])
w = layers.Input([4, 1])  # HERE
y = layers.Activation('tanh')(x)
output = layers.Concatenate()([y, w])  # HERE
model = models.Model(inputs=[x, w], outputs=output)  # HERE
loss = loss_fcn  # HERE

model.compile(loss=loss)
model.fit((data_x, data_y, data_w))

print('Done.')

Further, I found tf.reduce_mean, K.mean, tf.square, tf.exp etc. implemented in a loss funtion cause the same error.

MarioZ
  • 320
  • 4
  • 17