0

This is more of a conceptual question, but it is related to a practical problem I am having. Suppose I define a model, as an example, something like this:

import tensorflow as tf
from tensorflow.keras.layers import Input, Conv1D, MaxPooling1D, Dense, GlobalAveragePooling1D, Dropout
from tensorflow.keras.models import Model

def root(input_shape):

    input_tensor = Input(input_shape)

    cnn1 = Conv1D(100, 10, activation='relu', input_shape=input_shape)(input_shape)
    mp1 = MaxPooling1D((3,))(cnn1)
    cnn3 = Conv1D(160, 10, activation='relu')(mp1)
    gap1 = GlobalAveragePooling1D()(cnn3)
    drp1 = Dropout(0.5)(gap1)

    return Model(input_tensor, drp1)

And then the two branches

def branch_1(input_shape):

    input_tensor = Input(input_shape)
    dense1 = Dense(10, activation='relu')(input_tensor)
    prediction = Dense(1, activation='sigmoid')(dense1)

    return Model(input_tensor, prediction)
def branch_2(input_shape):

    input_tensor = Input(input_shape)
    dense1 = Dense(25, activation='relu')(input_shape)
    dropout1 = Dropout(rate=0.4)(dense1)
    prediction = Dense(1, activation='sigmoid')(dropout1)

    return Model(input_tensor, prediction)

Now, I create my final model as:

input_shape = (256, 1)

base_model = root(input_shape)

root_input = Input(input_shape)
root_output = base_model(root_input)

b1 = branch_1(root_output[0].shape[1:])
b1_output = b1(root_output)

b2 = branch_2(root_output[0].shape[1:])
b2_output = b2(root_output)

outputs = [b1_output, b2_output]

branched_model = Model(root_input, outputs)

The root_output is linked to both branch_1 and branch_2. As such, the error propagated to the last layer of model root comes from the outputs of both branch_1 and branch_2. My question is, how those errors are combined when propagated to the last layer of model root? Can I affect the way this combination is performed?

Alberto A
  • 1,160
  • 4
  • 17
  • 35

1 Answers1

1

You're not done yet, you still need to define your loss function for your model. This is where your errors are combined, for example MSE(label1, output1) + 2* MSE(label2, output2).

So when you backpropagate a batch, you calculate a vector (the gradient) that will change all the weights (in root, branch1 and branch2) so that your loss is minimized. Let's say you update your weights, and forward pass the same batch again. Now the loss will be lower (you just optimized for that batch), but loss2 (MSE(label2, output2)) will have dimished twice as much as loss1 ((MSE(label1, output1)).

Frederik Bode
  • 2,632
  • 1
  • 10
  • 17
  • Hmm, interesting. Can you give me an example of how to add this to the code above? What I did in my implementation was to simply compile the `branched_model` with a loss funtion, for instance: `branched_model.compile(optimizer=tf.keras.optimizers.Adam(), loss=tf.keras.losses.BinaryCrossentropy(), metrics=['accuracy'])`. And I pass to `y` of `branched_model` the list `[train_y, train_y]`. Same for validation. What will happen with the loss in this case? – Alberto A Apr 13 '20 at 14:41
  • Compiling root, branch_1 or branch_2 seperately doesn't mean anything, as your loss will be defined relative the the input and output of the _seperate models_, not over the entire thing (which is what you want, as you train your entire model (root+2branches)) end-to-end. You'll need to define a custom loss function with multiple outputs (https://stackoverflow.com/questions/51680818/keras-custom-loss-as-a-function-of-multiple-outputs/51685637#51685637), and compile the entire model with that loss. No need to `compile` any of the "intermediate" `tf.keras.Models` – Frederik Bode Apr 14 '20 at 10:18
  • Kinda of what I did (I only compiled `branched_model`, instead of compiling the individual parts), but instead of putting the loss in the compilation, I use the `add_loss` function to add this custom loss then. – Alberto A Apr 14 '20 at 12:19
  • Ah yes indeed! My mistake. – Frederik Bode Apr 16 '20 at 09:10