Keras: clean implementation for multiple outputs and custom loss functions?

Question

Coming from TensorFlow I feel like implementing anything else than basic, sequential models in Keras can be quite tricky. There is just so much stuff going on automatically. In TensorFlow, you always know your placeholders (input/output), shapes, structure, ... so that it is very easy to, for example, set up custom losses.

What is a clean way to define multiple outputs and custom loss functions?

Let's take an easy autoencoder as an example and use MNIST:

(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
X_train = X_train.reshape(-1, 28, 28, 1)

Short, convolutional encoder:

enc_in = Input(shape=(28, 28, 1), name="enc_in")
x = Conv2D(16, (3, 3))(enc_in)
x = LeakyReLU()(x)
x = MaxPool2D()(x)
x = Conv2D(32, (3, 3))(x)
x = LeakyReLU()(x)
x = Flatten()(x)
z = Dense(100, name="z")(x)

enc = Model(enc_in, z, name="encoder")

Similar architecture for the decoder. We do not care for padding and the decrease in dimensionality due to convolutions, so we just apply bilinear resizing at the end to match (batch, 28, 28, 1) again:

def resize_images(inputs, dims_xy):
    x, y = dims_xy
    return Lambda(lambda im: K.tf.image.resize_images(im, (y, x)))(inputs)

# decoder
dec_in = Input(shape=(100,), name="dec_in")
x = Dense(14 * 14 * 8)(dec_in)
x = LeakyReLU()(x)
x = Reshape((14, 14, 8))(x)
x = Conv2D(32, (3, 3))(x)
x = LeakyReLU()(x)
x = UpSampling2D()(x)
x = Conv2D(16, (3, 3))(x)
x = LeakyReLU()(x)
x = Conv2D(1, (3, 3), activation="linear")(x)
dec_out = resize_images(x, (28, 28))

dec = Model(dec_in, dec_out, name="decoder")

We define our own MSE to have an easy example...

def custom_loss(y_true, y_pred):
    return K.mean(K.square(y_true - y_pred))

...and finally build our complete model:

outputs = dec(enc(enc_in))
ae = Model(enc_in, outputs, name="ae")
ae.compile(optimizer=Adam(lr=1e-4), loss=custom_loss)

# training
ae.fit(x=X_train, y=X_train, batch_size=256, epochs=10)

If I define activation="sigmoid" in the last layer of the decoder in order to get nice images (output interval [0.0, 1.0]) the training loss diverges as Keras is not using the logits automatically, but feeding sigmoid activations into the loss. Thus, it is much better & faster for training to use activation="linear" in the last layer. In TensorFlow I would simply define two Tensors logits=x and output=sigmoid(x) to be able to use logits in any custom loss function and output for plotting or other applications.

How would I do such a thing in Keras?

Additionally, if I have several outputs, how do I use them in custom loss functions? Like KL divergence for VAEs or loss terms for GANs.

The functional API guide is not very helpful (especially if you compare this to the super extensive guides of TensorFlow) since it only covers basic LSTM examples where you do not have to define anything yourself, but only use predefined loss functions.

You mean that one output will be just 'dangling' and not used for training, right? — mrgloom, Jan 28 '19 at 17:23
I think it's possible by defining model with one head and some dangling output and then you can create `CustomCallback` derived from `keras.callbacks.Callback` in which on epoch end you can get output from dangling output layer like this https://stackoverflow.com/questions/41711190/keras-how-to-get-the-output-of-each-layer and pass it to tensorboard. — mrgloom, Jan 29 '19 at 10:10

score 1 · Accepted Answer · answered Jan 28 '19 at 12:14

In TensorFlow I would simply define two Tensors logits=x and output=sigmoid(x) to be able to use logits in any custom loss function and output for plotting or other applications.

In Keras you do exactly the same:

x = Conv2D(1, (3, 3), activation="linear")(x)
dec_out = resize_images(x, (28, 28))  # Output tensor used in training, for the loss function

training_model = Model(dec_in, dec_out, name="decoder")

...

sigmoid = Activation('sigmoid')(dec_out)
inference_model = Model(dec_in, sigmoid)

training_model.fit(x=X_train, y=X_train, batch_size=256, epochs=10)

prediction = inference_model.predict(some_input)

In Keras world your life becomes much easier if you got a single output tensor. Then you can get standard Keras features working for it. For two outputs/losses one possible workaround can be to concatenate them before output and then split again in the loss function. A good example here can be SSD implementation, which has classification and localization losses: https://github.com/pierluigiferrari/ssd_keras/blob/master/keras_loss_function/keras_ssd_loss.py#L133

In general, I do not understand those complains. It can be understood that a new framework causes frustration at first, but Keras is great because it can be simple when you need standard stuff and flexible when you need to go beyond. Number of complex models' implementations in Keras model zoo is a good justification for that. By reading that code you can learn various patterns for constructing models in Keras.

Thanks for the clarification. Looking at the SSD file, does this mean we would do something like `fit(x=X, y=[labels, aux_labels])` and define our model likewise `Model(inputs, [outputs, aux_outputs]` for a model that has auxiliary outputs? How would my `custom_loss` function receive it's arguments then? Would the structure of `y_pred` be a list of `[main_outputs, aux_outputs]` then and likewise `y_true` consist of `[labels, aux_labels]`? — daniel451, Jan 29 '19 at 03:47
In that SSD implementation the author prefer to encode the labels into one single matrix: https://github.com/pierluigiferrari/ssd_keras/blob/master/ssd_encoder_decoder/ssd_input_encoder.py#L293 Here, `boxes` part will be used for regression loss, and `classes` - for classification. However, having two inputs and outputs can be as easy as you wrote. See this: https://keras.io/getting-started/functional-api-guide/#multi-input-and-multi-output-models — Dmytro Prylipko, Jan 29 '19 at 10:36

Keras: clean implementation for multiple outputs and custom loss functions?

1 Answers1

Linked