1

Considering this tutorial and this question. If we try to calculate parameter witch caffe framework use we can see the layer wise parameter with :

for layer_name, param in net.params.iteritems():
    print layer_name + '\t' + str(param[0].data.shape), str(param[1].data.shape)

from tutorial :

The param shapes typically have the form (output_channels, input_channels, filter_height, filter_width) (for the weights) and the 1-dimensional shape (output_channels,) (for the biases).

the output became :

conv1   (96, 3, 11, 11) (96,)
conv2   (256, 48, 5, 5) (256,)
conv3   (384, 256, 3, 3) (384,)
conv4   (384, 192, 3, 3) (384,)
conv5   (256, 192, 3, 3) (256,)
fc6 (4096, 9216) (4096,)
fc7 (4096, 4096) (4096,)
fc8 (1000, 4096) (1000,)

and according to question for calculating:

sum([prod(v[0].data.shape) for k, v in net.params.items()])

What happen to bias? shouldn't we add up param[1] to sum ? What Caffe put into bias parameters(0 or 1 or other)? fifth parameter is bias, isn't it? Am i understand it correctly?

Edit : If i multiplied it with these code :

for k, v in net.params.items():
    weight_param = weight_param + prod(v[0].data.shape) * prod(v[1].data.shape)

It return huge number : 228224076800 , Are those real parameter used by system?

  • 1
    Yes. `conv1` has 96 kernels each of size (11, 11, 3) and 96 bias values. One bias for each kernel. – Autonomous Jun 13 '17 at 17:59
  • @ParagS.Chandakkar should we add them or multiple them to other parameters ? how? – anderson kooper Jun 13 '17 at 18:11
  • Just add it. So in your example, `conv1` has `(96 * 11 * 11 * 3) + 96` parameters. – Autonomous Jun 13 '17 at 18:22
  • @ParagS.Chandakkar are you sure ? because 96 filters are used for each channel in first layer? – anderson kooper Jun 13 '17 at 18:29
  • It should be `weight_param = weight_param + prod(v[0].data.shape) + v[1].data.shape`. Note the `+` sign instead of `*`. I do not know what you mean. There are 96 kernels, each of size `11 * 11 * 3`. I suggest you to read [this](http://cs231n.github.io/convolutional-networks/). It also shows how to compute parameters of VGG network. – Autonomous Jun 13 '17 at 18:53
  • @ParagS.Chandakkar , okay, i now understand. One bias per 3 channels. make that to an answer. – anderson kooper Jun 13 '17 at 19:00
  • please see [this bvlc/caffe PR](https://github.com/BVLC/caffe/issues/2507) for more information. – Shai Jun 14 '17 at 06:13

0 Answers0