Training only one output of a network in Keras

Question

I have a network in Keras with many outputs, however, my training data only provides information for a single output at a time.

At the moment my method for training has been to run a prediction on the input in question, change the value of the particular output that I am training and then doing a single batch update. If I'm right this is the same as setting the loss for all outputs to zero except the one that I'm trying to train.

Is there a better way? I've tried class weights where I set a zero weight for all but the output I'm training but it doesn't give me the results I expect?

I'm using the Theano backend.

That's an uncommon setting for supervised-learning. Show some example data and explain a bit why you got this setting. — sascha, Nov 06 '16 at 12:51
I'm using it for Deep Q-Learning. The input is a state and each output is the score for an action. You pick an action and then update the network based on the result of that action. You only want to however update one output as you don't know the result of the other actions... — simeon, Nov 07 '16 at 09:28
I see. This is differently handled. Look at [these sources](https://gist.github.com/EderSantana/c7222daa328f0e885093#file-qlearn-py-L98) (i marked the line in the link). You just keep the current values for the other actions! — sascha, Nov 07 '16 at 10:44
I would like to implement a similar CNN with multiple outputs (multi-task learning). I will run the network on the input (images), get one of the outputs; then depending on the output, select one of the other outputs to run the network and obtain the final output. In training, I will update only one of the streams at a time. This is a very common problem, I think, but strangely, there is no example or documentation to describe a solution. @simeon: did you manage to solve your problem? If so, how? Thx. — Blackberry, Aug 09 '17 at 11:53
I actually did the other day and had forgotten about this post. I will put a more detailed response tonight, however, in Keras you can make multiple models with the same layers where the values are shared (off the top of my head you need to use the alternative to 'Sequence'). I basically made a model for each output which shared the layers. It worked well. — simeon, Aug 10 '17 at 22:55

score 18 · Answer 1 · answered May 16 '19 at 10:53

Outputting multiple results and optimizing only one of them

Let's say you want to return output from multiple layers, maybe from some intermediate layers, but you need to optimize only one target output. Here's how you can do it:

Let's start with this model:

inputs = Input(shape=(784,))
x = Dense(64, activation='relu')(inputs)

# you want to extract these values
useful_info = Dense(32, activation='relu', name='useful_info')(x)

# final output. used for loss calculation and optimization
result = Dense(1, activation='softmax', name='result')(useful_info)

Compile with multiple outputs, set loss as `None` for extra outputs:

Give None for outputs that you don't want to use for loss calculation and optimization

model = Model(inputs=inputs, outputs=[result, useful_info])
model.compile(optimizer='rmsprop',
              loss=['categorical_crossentropy', None],
              metrics=['accuracy'])

Provide only target outputs when training. Skipping extra outputs:

model.fit(my_inputs, {'result': train_labels}, epochs=.., batch_size=...)

# this also works:
#model.fit(my_inputs, [train_labels], epochs=.., batch_size=...)

One predict to get them all

Having one model you can run predict only once to get all outputs you need:

predicted_labels, useful_info = model.predict(new_x)

somehow this is not working in v2.3.0 as I am getting the error: `ValueError: The two structures don't have the same sequence length. Input structure has length 1, while shallow structure has length 3.` — omsrisagar, Oct 27 '20 at 23:29
I get the following error when attempting to apply this to my network: "ValueError: Variable has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval." My tensorflow==1.14.0. My losses are [None,''categorical_crossentropy"] — Cam K, Mar 01 '21 at 15:48
@omsrisagar yes, me too! Could you find a solution by any chance? — MJimitater, May 27 '21 at 08:03

score 3 · Answer 2 · answered Aug 21 '17 at 02:35

In order to achieve this I ended up using the 'Functional API'. You basically create multiple models, using the same layers input and hidden layers but different output layers.

For example:

https://keras.io/getting-started/functional-api-guide/

from keras.layers import Input, Dense
from keras.models import Model

# This returns a tensor
inputs = Input(shape=(784,))

# a layer instance is callable on a tensor, and returns a tensor
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions_A = Dense(1, activation='softmax')(x)
predictions_B = Dense(1, activation='softmax')(x)

# This creates a model that includes
# the Input layer and three Dense layers
modelA = Model(inputs=inputs, outputs=predictions_A)
modelA.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
modelB = Model(inputs=inputs, outputs=predictions_B)
modelB.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

The problem here is - you have to run prediction twice to get both outputs. — Serhiy, May 16 '19 at 09:54
@Serhiy He can just create a third `predictions = Concatenate()([predictions_A, predictions_B])` and set that to the output of a third model. — Bersan, May 09 '20 at 20:02

Training only one output of a network in Keras

2 Answers2

Outputting multiple results and optimizing only one of them

Let's start with this model:

Compile with multiple outputs, set loss as `None` for extra outputs:

Provide only target outputs when training. Skipping extra outputs:

One predict to get them all

Linked

Training only one output of a network in Keras

2 Answers2

Outputting multiple results and optimizing only one of them

Let's start with this model:

Compile with multiple outputs, set loss as None for extra outputs:

Provide only target outputs when training. Skipping extra outputs:

One predict to get them all

Linked

Compile with multiple outputs, set loss as `None` for extra outputs: