Using the same net twice is something called "Siamese network". The way it is implemented in caffe is by explicitly duplicating the network, but using "name"
param for each parameters blob to create a single copy of the underlying parameters. See this prototxt for example.
Once you explicitly defeine the net twice, you can assign different "lr_mult"
params for each copy.
So suppose your reference network N
has an input layer (which I'll skip in this example) and an inner product layer named "ip1"
. Then
layer {
name: "ip1_a"
bottom: "data_a"
top: "ip1_a"
type: "InnerProduct"
inner_product_param {
num_output: 10
}
param {
name: "ip1_w" # NOTE THIS NAME!
lr_mult: 1
}
param {
name: "ip1_b"
lr_mult: 2
}
}
layer {
name: "ip1_b"
bottom: "data_b"
top: "ip1_b"
type: "InnerProduct"
inner_product_param {
num_output: 10
}
param {
name: "ip1_w" # NOTE THIS NAME: it's the same!
lr_mult: 10 # different LR for this branch
}
param {
name: "ip1_b"
lr_mult: 20
}
}
# one layer to combine them
layer {
type: "Concat"
bottom: "ip1_a"
bottom: "ip1_b"
top: "ip1_combine"
name: "concat"
}
layer {
name: "joint_ip"
type: "InnerProduct"
bottom: "ip1_combine"
top: "joint_ip"
inner_product_param {
num_output: 30
}
}
If you finetune, you might need to do some net-surgery in order of the original wieghts to be saved in the .caffemodel
file with the names "ip1_w"
and "ip1_b"
.