I am trying to interpret and understand the models that are written in Caffe .proto.
Yesterday I came across a sample 'deploy.prototxt'
by Shai in here, as quoted below:
layer {
name: "ip1_a"
bottom: "data_a"
top: "ip1_a"
type: "InnerProduct"
inner_product_param {
num_output: 10
}
param {
name: "ip1_w" # NOTE THIS NAME!
lr_mult: 1
}
param {
name: "ip1_b"
lr_mult: 2
}
}
layer {
name: "ip1_b"
bottom: "data_b"
top: "ip1_b"
type: "InnerProduct"
inner_product_param {
num_output: 10
}
param {
name: "ip1_w" # NOTE THIS NAME: it's the same!
lr_mult: 10 # different LR for this branch
}
param {
name: "ip1_b"
lr_mult: 20
}
}
# one layer to combine them
layer {
type: "Concat"
bottom: "ip1_a"
bottom: "ip1_b"
top: "ip1_combine"
name: "concat"
}
layer {
name: "joint_ip"
type: "InnerProduct"
bottom: "ip1_combine"
top: "joint_ip"
inner_product_param {
num_output: 30
}
}
I understand this model definition as:
data_a data_b
| |
| |
------- -------
| ip1_a | | ip1_b |
------- -------
| |
| |
ip1_a ip1_b
| |
| |
V V
~~~~~~~~~~~~~~~
|
|
V
-------------
| concat |
-------------
|
|
ip1_combine
|
|
-------------
| joint_ip |
-------------
|
|
joint_ip
blob ip1_a
is trained by layer ip1_a
, with weights initialized with ip1_w
(lr:1) and bias initialized with ip1_b
(lr:2).
blob ip1_a
is actually the new learned weights which was initialized with ip1_w
. The learned bias doesn't have a name.
In some models, we can find some layers have:
lr_mult:1
lr_mult:2
Where the first instance of lr_mult
always correspond to weights and the next instance for bias.
Are my above understandings correct?