1

For a network architecture like this:

          +---+   
input1--->| CNN | -------|
           +---+         |  
                         |
           +---+      +-------+          +-------+ 
input2--->| CNN | ----| Concat|-----|----|  VGG  |---- Main_out
           +---+      +-------+     |    +-------+   
                          |         |
           +---+          |         |
input3--->| CNN | --------|        Aux_out
           +---+   

How does the backpropagation flow go? I mean, there are two backpropagation steps? Or the only one that comes from the Main_out updates the weights.

I am using loss weights for each output:

model.compile(loss="categorical_crossentropy",optimizer=OPT,metrics=["accuracy"],
            loss_weights={'main_output': 1., 'aux_output': 0.2}
JulesR
  • 29
  • 1
  • 8
  • @MatiasValdenegro I have better main_out accuracy when the aux_out is included. I know that aux_out could help with vanishing gradient problem. I just want to understand how Keras handle backpropagation in this situation. Like does the first three CNN weights are updated twice? One updating from the main_out and another update from the aux-out? – JulesR Jul 26 '19 at 08:37

2 Answers2

2

The losses for different outputs are combined into a final loss according to loss_weights,

final_loss = loss_main + 0.2*loss_aux

the parameters will be updated with respect to this loss by one backpropagation step at each iteration.

dontloo
  • 10,067
  • 4
  • 29
  • 50
1

(I cannot post a comment as i don't have enough reputation so i'm posting my question as an answer. Sorry for that but i'm struggling to have information on that subject)

As i asked the same question here, i also have trouble understanding how it works; as JulesR, i have better "main ouput" accuracy when adding "aux_out" using another network architecture.

If i understand dontloo's response (please correct me if i'm wrong), there is only one backpropagation despite the multiple outputs but the loss used is weighted according to the outputs. So for JulesR's network, the update of the VGG weights during backpropagation is also influenced by this weighted loss (therefore by the "intermediate output")? If yes, isn't it strange regarding the fact that the VGG network is after this output?

Also, @JulesR has mentioned that auxiliary outputs can helps the vanishing gradient problem. Do you have some links about articles speaking of the effects of auxiliary outputs?

  • 1
    Aux outputs used un GoogLeNet see https://medium.com/coinmonks/paper-review-of-googlenet-inception-v1-winner-of-ilsvlc-2014-image-classification-c2b3565a64e7 – JulesR Jul 29 '19 at 08:30