3

I have a pretrained network (let's call it N) I would like to use twice within a new network. Anybody knows how to duplicate it? Then I would like to assign a different learning rate to each copy.

For example (N1 is the 1st copy of N, N2 is the 2nd copy of N), the new network might look like:

N1 --> [joint ip 
N2 -->    layer]

I know how to reuse N with a single copy, however, since N1 and N2 will have different (finetune) learning rates, I don't know how can I make 2 copies of N and assign different learning rate for each.

Thanks!

Shai
  • 111,146
  • 38
  • 238
  • 371
Yuval Atzmon
  • 5,645
  • 3
  • 41
  • 74

1 Answers1

3

Using the same net twice is something called "Siamese network". The way it is implemented in caffe is by explicitly duplicating the network, but using "name" param for each parameters blob to create a single copy of the underlying parameters. See this prototxt for example.
Once you explicitly defeine the net twice, you can assign different "lr_mult" params for each copy.

So suppose your reference network N has an input layer (which I'll skip in this example) and an inner product layer named "ip1". Then

 layer {
   name: "ip1_a"
   bottom: "data_a"
   top: "ip1_a"
   type: "InnerProduct"
   inner_product_param {
     num_output: 10
   }
   param {
     name: "ip1_w"  # NOTE THIS NAME!
     lr_mult: 1
   }
   param {
     name: "ip1_b"
     lr_mult: 2
   }
 }
 layer {
   name: "ip1_b"
   bottom: "data_b"
   top: "ip1_b"
   type: "InnerProduct"
   inner_product_param {
     num_output: 10
   }
   param {
     name: "ip1_w"  # NOTE THIS NAME: it's the same!
     lr_mult: 10 # different LR for this branch
   }
   param {
     name: "ip1_b"
     lr_mult: 20
   }
 }
 # one layer to combine them     
 layer {
   type: "Concat"
   bottom: "ip1_a"
   bottom: "ip1_b"
   top: "ip1_combine"
   name: "concat"
 }
 layer {
   name: "joint_ip"
   type: "InnerProduct"
   bottom: "ip1_combine"
   top: "joint_ip"
   inner_product_param {
     num_output: 30
   }
 } 

If you finetune, you might need to do some net-surgery in order of the original wieghts to be saved in the .caffemodel file with the names "ip1_w" and "ip1_b".

Shai
  • 111,146
  • 38
  • 238
  • 371
  • 1
    Thank you Shai, Toda! I do need to do some net-surgery, thank you for pointing out the keyword. Here is a link with an example for future reference http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb – Yuval Atzmon Nov 29 '15 at 21:12
  • Hi Shai, I get some unexpected behavior when I apply this method. Any chance that the weights are shared between N1 and N2? If yes, is there a way to disable it? I would like N1 and N2 to be independent copies of N. Thanks! – Yuval Atzmon Dec 04 '15 at 14:07
  • @user2476373 in that case, you might not need to give names to the parameters blobs, just use the same "name" for the entire layer. – Shai Dec 05 '15 at 19:41
  • 1
    That doesn't work. When I give two layers the same name, the loaded values are not being loaded to both (not sure whether to either any of them, or just to the 2nd). Anyway, I solved the issue, by training a single step (with 0 learning rate) of a siamese network, giving each duplicate layer a different name, but let them share the weights. Then saving a snapshot of this new duplicate network, and init the weights of my network from this snapshot. Thank you again for your assistance here :) – Yuval Atzmon Dec 06 '15 at 03:46
  • @user2476373 nice workaround. you could do all these weights manipulations in python – Shai Dec 06 '15 at 05:06
  • This example shows using different `lr_mult` values, but this doesn't work in the latest caffe. Instead it will give an error: "Shared param 'ip1w' has mismatched lr_mult." – ferrouswheel Mar 20 '17 at 01:01