1

I am creating a network using Caffe, for which I need to define my own layer. I would like to use the Python layer for this.

My layer will contain some learned parameters. From this answer, I am told that I will need to create a blob vector for this.

  1. Is there any specification that this blob will need to follow, such as constraints in dimensions, etc.? Irrespective of what my layer does, can I create a blob of one dimension, and use any element, one each, of the blob for any computation in the layer?
  2. What does the diff of a blob mean? From what I understand, the diff of bottom is the gradient at the current layer, and top for the previous layer. However, what exactly is happening here?
  3. When do these parameters get trained? Does this need to be done manually in the layer definition?

I have seen the examples in test_python_layer.py, but most of them do not have any parameters.

GoodDeeds
  • 7,956
  • 5
  • 34
  • 61

1 Answers1

1

You can add as many internal parameters as you wish, and these parameters (Blobs) may have whatever shape you want them to be.

To add Blobs (in your layer's class):

def setup(self, bottom, top):
  self.blobs.add_blob(2) # add two blobs
  self.blobs[0].reshape(3, 4)  # first blob is 2D
  self.blobs[0].data[...] = 0 # init 
  self.blobs[1].reshape(10)  # second blob is 1D with 10 elements
  self.blobs[1].data[...] = 1 # init to 1

What is the "meaning" of each parameter and how to organize them in self.blobs is entirely up to you.

How are trainable parameters being "trained"?
This is one of the cool things about caffe (and other DNN toolkits as well), you don't need to worry about it!
What do you need to do? All you need is to compute the gradient of the loss w.r.t the parameters and store it in self.blobs[i].diff. Once the gradients are updated, caffe's internals takes care of updating the parameters according to the gradients/learning rate/momentum/update policy etc.
So,
You must have a non-trivial backward method for your layer

backward(self, top, propagate_down, bottom):
  self.blobs[0].diff[...] = # diff of parameters
  self.blobs[1].diff[...] = # diff for all the blobs

You might want to test your implementation of the layer, once you complete it.
Have a look at this PR for a numerical test of the gradients.

Shai
  • 111,146
  • 38
  • 238
  • 371
  • Thank you! If I want a particular blob to be a fixed parameter and not learn, I can set the diff to 0 right? – GoodDeeds Jun 08 '17 at 17:26
  • @GoodDeeds or set the learning rate to zero for this blob. see BatchNorm layer for example – Shai Jun 08 '17 at 17:28
  • I am getting an error with this. I followed the method of creating a blob as per your example, and I get an index out of range error when I try to access `self.blobs[1]`. There is no such error if I do `add_blob(1)` twice, but maybe there's a different logical error in that case. Could you please tell why this could be happening? – GoodDeeds Jun 08 '17 at 18:43
  • @GoodDeeds it is likely I got this wrong in my answer. Do what works for you! – Shai Jun 08 '17 at 18:45
  • Okay, thank you. That seems to work now. I also ran the code you wrote in the linked PR (thank you!), and I am facing the same issue as in [this comment](https://github.com/BVLC/caffe/pull/5157#issuecomment-299449635). The values in my case differ by a factor of 2 as well. Is this an issue with my layer, or is this a known issue (asking because this was reported before)? – GoodDeeds Jun 08 '17 at 18:52
  • @GoodDeeds it might be related to the PR. Did not get the chance of look at it yet – Shai Jun 08 '17 at 18:56