12

I have a network in Keras with many outputs, however, my training data only provides information for a single output at a time.

At the moment my method for training has been to run a prediction on the input in question, change the value of the particular output that I am training and then doing a single batch update. If I'm right this is the same as setting the loss for all outputs to zero except the one that I'm trying to train.

Is there a better way? I've tried class weights where I set a zero weight for all but the output I'm training but it doesn't give me the results I expect?

I'm using the Theano backend.

Maxim
  • 52,561
  • 27
  • 155
  • 209
simeon
  • 585
  • 4
  • 15
  • That's an uncommon setting for supervised-learning. Show some example data and explain a bit why you got this setting. – sascha Nov 06 '16 at 12:51
  • I'm using it for Deep Q-Learning. The input is a state and each output is the score for an action. You pick an action and then update the network based on the result of that action. You only want to however update one output as you don't know the result of the other actions... – simeon Nov 07 '16 at 09:28
  • 1
    I see. This is differently handled. Look at [these sources](https://gist.github.com/EderSantana/c7222daa328f0e885093#file-qlearn-py-L98) (i marked the line in the link). You just keep the current values for the other actions! – sascha Nov 07 '16 at 10:44
  • I would like to implement a similar CNN with multiple outputs (multi-task learning). I will run the network on the input (images), get one of the outputs; then depending on the output, select one of the other outputs to run the network and obtain the final output. In training, I will update only one of the streams at a time. This is a very common problem, I think, but strangely, there is no example or documentation to describe a solution. @simeon: did you manage to solve your problem? If so, how? Thx. – Blackberry Aug 09 '17 at 11:53
  • I actually did the other day and had forgotten about this post. I will put a more detailed response tonight, however, in Keras you can make multiple models with the same layers where the values are shared (off the top of my head you need to use the alternative to 'Sequence'). I basically made a model for each output which shared the layers. It worked well. – simeon Aug 10 '17 at 22:55

2 Answers2

18

Outputting multiple results and optimizing only one of them

Let's say you want to return output from multiple layers, maybe from some intermediate layers, but you need to optimize only one target output. Here's how you can do it:

Let's start with this model:

inputs = Input(shape=(784,))
x = Dense(64, activation='relu')(inputs)

# you want to extract these values
useful_info = Dense(32, activation='relu', name='useful_info')(x)

# final output. used for loss calculation and optimization
result = Dense(1, activation='softmax', name='result')(useful_info)

Compile with multiple outputs, set loss as None for extra outputs:

Give None for outputs that you don't want to use for loss calculation and optimization

model = Model(inputs=inputs, outputs=[result, useful_info])
model.compile(optimizer='rmsprop',
              loss=['categorical_crossentropy', None],
              metrics=['accuracy'])

Provide only target outputs when training. Skipping extra outputs:

model.fit(my_inputs, {'result': train_labels}, epochs=.., batch_size=...)

# this also works:
#model.fit(my_inputs, [train_labels], epochs=.., batch_size=...)

One predict to get them all

Having one model you can run predict only once to get all outputs you need:

predicted_labels, useful_info = model.predict(new_x)
Serhiy
  • 4,357
  • 5
  • 37
  • 53
  • 1
    somehow this is not working in v2.3.0 as I am getting the error: `ValueError: The two structures don't have the same sequence length. Input structure has length 1, while shallow structure has length 3.` – omsrisagar Oct 27 '20 at 23:29
  • I get the following error when attempting to apply this to my network: "ValueError: Variable has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval." My tensorflow==1.14.0. My losses are [None,''categorical_crossentropy"] – Cam K Mar 01 '21 at 15:48
  • @omsrisagar yes, me too! Could you find a solution by any chance? – MJimitater May 27 '21 at 08:03
3

In order to achieve this I ended up using the 'Functional API'. You basically create multiple models, using the same layers input and hidden layers but different output layers.

For example:

https://keras.io/getting-started/functional-api-guide/

from keras.layers import Input, Dense
from keras.models import Model

# This returns a tensor
inputs = Input(shape=(784,))

# a layer instance is callable on a tensor, and returns a tensor
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions_A = Dense(1, activation='softmax')(x)
predictions_B = Dense(1, activation='softmax')(x)

# This creates a model that includes
# the Input layer and three Dense layers
modelA = Model(inputs=inputs, outputs=predictions_A)
modelA.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
modelB = Model(inputs=inputs, outputs=predictions_B)
modelB.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
simeon
  • 585
  • 4
  • 15
  • 1
    The problem here is - you have to run prediction twice to get both outputs. – Serhiy May 16 '19 at 09:54
  • 1
    @Serhiy He can just create a third `predictions = Concatenate()([predictions_A, predictions_B])` and set that to the output of a third model. – Bersan May 09 '20 at 20:02