Interpreting Caffe models

Question

I am trying to interpret and understand the models that are written in Caffe .proto.

Yesterday I came across a sample 'deploy.prototxt' by Shai in here, as quoted below:

layer {
   name: "ip1_a"
   bottom: "data_a"
   top: "ip1_a"
   type: "InnerProduct"
   inner_product_param {
     num_output: 10
   }
   param {
     name: "ip1_w"  # NOTE THIS NAME!
     lr_mult: 1
   }
   param {
     name: "ip1_b"
     lr_mult: 2
   }
 }
 layer {
   name: "ip1_b"
   bottom: "data_b"
   top: "ip1_b"
   type: "InnerProduct"
   inner_product_param {
     num_output: 10
   }
   param {
     name: "ip1_w"  # NOTE THIS NAME: it's the same!
     lr_mult: 10 # different LR for this branch
   }
   param {
     name: "ip1_b"
     lr_mult: 20
   }
 }
 # one layer to combine them     
 layer {
   type: "Concat"
   bottom: "ip1_a"
   bottom: "ip1_b"
   top: "ip1_combine"
   name: "concat"
 }
 layer {
   name: "joint_ip"
   type: "InnerProduct"
   bottom: "ip1_combine"
   top: "joint_ip"
   inner_product_param {
     num_output: 30
   }
 }

I understand this model definition as:

     data_a         data_b
        |             |
        |             |
     -------       -------   
    | ip1_a |     | ip1_b |
     -------       -------
        |             |
        |             |
      ip1_a         ip1_b
        |             |
        |             |
        V             V
        ~~~~~~~~~~~~~~~
               |
               |
               V
         ------------- 
        |    concat   |
         ------------- 
               |
               |
         ip1_combine
               |
               |
         ------------- 
        |   joint_ip  |
         ------------- 
               |
               |
            joint_ip

blob ip1_a is trained by layer ip1_a, with weights initialized with ip1_w(lr:1) and bias initialized with ip1_b(lr:2). blob ip1_a is actually the new learned weights which was initialized with ip1_w. The learned bias doesn't have a name.

In some models, we can find some layers have:

lr_mult:1
lr_mult:2

Where the first instance of lr_mult always correspond to weights and the next instance for bias.

Are my above understandings correct?

score 1 · Accepted Answer · edited May 23 '17 at 11:59

You are mixing two data types: the input (training) data and the net's parameters.
During training the input data is fixed to a known training/validation set and only the net parameters are changed. In contrast, when deploying the net, the data changes to new images while the net parameters are fixed. See this answer for some in-depth description of the way caffe stores these two types of data.

In the example you showed, there are two input training data paths: data_a and data_b that might be different images each time. The input blobs pass through an InnerProduct layer to become ip1_a and ip1_b blobs respectively. Then they are concatenated into a single blob ip1_combined which in turn is fed into the final InnerProduct layer.

On the other hand, the model have a set of parameters: ip1_w and ip1_b (the weights and bias) of the first inner product layer. In this particular examples the parameters of the layer were explicitly named to indicate the fact that they are shared between ip1_a and ip1_b layers.

As for the two lr_mult: then yes, the first is the LR multiplier of the weights, and the second one is for the bias term.

Thankyou, Shai! One more question, learning rate is applied during back propagation. Thus, while backpropagating, ip1_w is updated with lr1 in ip1_a layer, and this UPDATED ip1_w is used by ip1_b layer with its lr to update again? — Anoop K. Prabhu, Dec 01 '15 at 07:08
@AnoopK.Prabhu the back propagation in this case provides two updates for the parameters `ip1_w` and `ip1_b` once by the pass through `ip1_a` and once by `ip1_b` because in this special case the internal representation of the layers' parameters is linked to the same parameters. As to why these updates use different LR - you'll have to ask the OP of [the original question](http://stackoverflow.com/questions/33983627/how-to-reuse-same-network-twice-within-a-new-network-in-caffe?lq=1). — Shai, Dec 01 '15 at 07:15

Interpreting Caffe models

1 Answers1