16

I am trying to optimize a model with the following two loss functions

def loss_1(pred, weights, logits):
    weighted_sparse_ce = kls.SparseCategoricalCrossentropy(from_logits=True)
    policy_loss = weighted_sparse_ce(pred, logits, sample_weight=advantages)

and

def loss_2(y_pred, y):
    return kls.mean_squared_error(y_pred, y)

however, because TensorFlow 2 expects loss function to be of the form

def fn(y_pred, y_true):
    ...

I am using a work-around for loss_1 where I pack pred and weights into a single tensor before passing to loss_1 in the call to model.fit and then unpack them in loss_1. This is inelegant and nasty because pred and weights are of different data types and so this requires an additional cast, pack, un-pack and un-cast each time I call model.fit.

Furthermore, I am aware of the sample_weight argument to fit, which is kind of like the solution to this question. This might be a workable solution were it not for the fact that I am using two loss functions and I only want the sample_weight applied to one of them. Also, even if this were a solution, would it not be generalizable to other types of custom loss functions.


All that being said, my question, said concisely, is:

What is the best way to create a loss function with an arbitrary number of arguments in TensorFlow 2?

Another thing I have tried is passing a tf.tuple but that also seems to violate TensorFlow's desires for a loss function input.

Jon Deaton
  • 3,943
  • 6
  • 28
  • 41
  • How about using closure? Basically, you can define a standard loss function, we named `inside_loss`, that only takes (`y_true`, and `y_pred`) inside your `loss_1`. You can pass weights or logits, any arguments to `loss_1`. Finally, your `loss_1` will return `inside_loss` this function. It pretty like how we customize keras loss function. https://github.com/keras-team/keras/issues/2121 – zihaozhihao Sep 20 '19 at 06:37
  • 1
    @zihaozhihao Thats an interesting solution but it wouldn't work when using eager tensors or NumPy arrays as inputs though. – Jon Deaton Sep 20 '19 at 17:13
  • Umm, do you mean the arguments of `loss_1`? If so, I'm sure that works. – zihaozhihao Sep 20 '19 at 17:51
  • Yes for `loss_1` and no it wouldn't work because the data captured by the closure is not available at the time of the creation of the closure. – Jon Deaton Sep 20 '19 at 18:45
  • TF 2.0 expects loss function to be of the form `def fn(y_true, y_pred)`, that is, y_true is the first argument. – toliveira Nov 09 '20 at 18:05

3 Answers3

12

This problem can be easily solved using custom training in TF2. You need only compute your two-component loss function within a GradientTape context and then call an optimizer with the produced gradients. For example, you could create a function custom_loss which computes both losses given the arguments to each:

def custom_loss(model, loss1_args, loss2_args):
  # model: tf.model.Keras
  # loss1_args: arguments to loss_1, as tuple.
  # loss2_args: arguments to loss_2, as tuple.
  with tf.GradientTape() as tape:
    l1_value = loss_1(*loss1_args)
    l2_value = loss_2(*loss2_args)
    loss_value = [l1_value, l2_value]
  return loss_value, tape.gradient(loss_value, model.trainable_variables)

# In training loop:
loss_values, grads = custom_loss(model, loss1_args, loss2_args)
optimizer.apply_gradients(zip(grads, model.trainable_variables))

In this way, each loss function can take an arbitrary number of eager tensors, regardless of whether they are inputs or outputs to the model. The sets of arguments to each loss function need not be disjoint as shown in this example.

Jon Deaton
  • 3,943
  • 6
  • 28
  • 41
7

To expand on Jon's answer. In case you want to still have the benefits of a Keras Model you can expand the model class and write your own custom train_step:

from tensorflow.python.keras.engine import data_adapter

# custom loss function that takes two outputs of the model
# as input parameters which would otherwise not be possible
def custom_loss(gt, x, y):
    return tf.reduce_mean(x) + tf.reduce_mean(y)

class CustomModel(keras.Model):
    def compile(self, optimizer, my_loss):
        super().compile(optimizer)
        self.my_loss = my_loss

    def train_step(self, data):
        data = data_adapter.expand_1d(data)
        input_data, gt, sample_weight = data_adapter.unpack_x_y_sample_weight(data)

        with tf.GradientTape() as tape:
            y_pred = self(input_data, training=True)
            loss_value = self.my_loss(gt, y_pred[0], y_pred[1])

        grads = tape.gradient(loss_value, self.trainable_variables)
        self.optimizer.apply_gradients(zip(grads, self.trainable_variables))

        return {"loss_value": loss_value}

...

model = CustomModel(inputs=input_tensor0, outputs=[x, y])
model.compile(optimizer=tf.keras.optimizers.Adam(), my_loss=custom_loss)
Jodo
  • 4,515
  • 6
  • 38
  • 50
  • I just tried your code... but do get the error "ValueError: The model cannot be compiled because it has no loss to optimize." Working with Keras 2.3.0 and tensorflow 2.2.0 – zwep May 07 '21 at 09:44
  • That usually means that you are either passing no loss function or a loss function without any gradients that can be used for optimization. E.g. if your loss function would just return a scalar. – Jodo May 08 '21 at 11:13
0

In tf 1.x we have tf.nn.weighted_cross_entropy_with_logits function which allows us trade off recall and precision by adding extra positive weights for each class. In multi-label classification, it should be a (N,) tensor or numpy array. However, in tf 2.0, I haven't found similar loss functions yet, so I wrote my own loss function with extra arguments pos_w_arr.

from tensorflow.keras.backend import epsilon

def pos_w_loss(pos_w_arr):
    """
    Define positive weighted loss function
    """
    def fn(y_true, y_pred):
        _epsilon = tf.convert_to_tensor(epsilon(), dtype=y_pred.dtype.base_dtype)
        _y_pred = tf.clip_by_value(y_pred, _epsilon, 1. - _epsilon)
        cost = tf.multiply(tf.multiply(y_true, tf.math.log(
            _y_pred)), pos_w_arr)+tf.multiply((1-y_true), tf.math.log(1-_y_pred))
        return -tf.reduce_mean(cost)
    return fn

Not sure what do you mean it wouldn't work when using eager tensors or numpy array as inputs though. Please correct me if I'm wrong.

zihaozhihao
  • 4,197
  • 2
  • 15
  • 25
  • 1
    This would work in TF 1.x where `pos_w_arr` is not an eager tensor. In TF 2 `pos_w_arr` is not available at the time of creation of the closure so `pos_w_arr` would have to be a constant. I am interested in the case where `pos_w_arr` varies across batches. – Jon Deaton Sep 20 '19 at 18:42
  • If possible, you can set `pos_w_arr` as `tf.keras.Input`. – zihaozhihao Sep 20 '19 at 20:19
  • Basically, when you fit your model, `x=[x_data,pos_w]`, `x_data` and `pos_w` are both `Input`. – zihaozhihao Sep 20 '19 at 20:24
  • In my case `pos_w_arr` needs to be an *output* of the model. Would `tf.keras.Input` be the right thing then? – Jon Deaton Sep 20 '19 at 23:17