Trying to understand custom loss layer in caffe

Question

I have seen one can define a custom loss layer for example EuclideanLoss in caffe like this:

import caffe
import numpy as np


    class EuclideanLossLayer(caffe.Layer):
        """
        Compute the Euclidean Loss in the same manner as the C++ 
EuclideanLossLayer
        to demonstrate the class interface for developing layers in Python.
        """

        def setup(self, bottom, top):
            # check input pair
            if len(bottom) != 2:
                raise Exception("Need two inputs to compute distance.")

        def reshape(self, bottom, top):
            # check input dimensions match
            if bottom[0].count != bottom[1].count:
                raise Exception("Inputs must have the same dimension.")
            # difference is shape of inputs
            self.diff = np.zeros_like(bottom[0].data, dtype=np.float32)
            # loss output is scalar
            top[0].reshape(1)

        def forward(self, bottom, top):
            self.diff[...] = bottom[0].data - bottom[1].data
            top[0].data[...] = np.sum(self.diff**2) / bottom[0].num / 2.

        def backward(self, top, propagate_down, bottom):
            for i in range(2):
                if not propagate_down[i]:
                    continue
                if i == 0:
                    sign = 1
                else:
                    sign = -1
                bottom[i].diff[...] = sign * self.diff / bottom[i].num

However, I have a few question regarding that code:

If I want to customise this layer and change the computation of the loss in this line:

top[0].data[...] = np.sum(self.diff**2) / bottom[0].num / 2.

Lets say to:

channelsAxis = bottom[0].data.shape[1]
self.diff[...] = np.sum(bottom[0].data, axis=channelAxis) - np.sum(bottom[1].data, axis=channelAxis)
top[0].data[...] = np.sum(self.diff**2) / bottom[0].num / 2.

How do I have to change the backward function? For EuclideanLoss it is:

bottom[i].diff[...] = sign * self.diff / bottom[i].num

How does it have to look for my described loss?

What is the sign for?

I am sorry, I confused myself a little bit as well. I have updated the question! @Shai — , Jun 21 '17 at 11:25
`top[0].data[...] = euclidean_weight * euclidean + other_weight * other` is not the right way to do this. You can have a **regular** Euclidean loss layer with `loss_weight: euclidean_weight` and your own `"OtherLoss"` layer with `loss_weight: other_weight`. — Shai, Jun 21 '17 at 11:32
Okay, so if I want to create my own loss do I need to change any lines in the backward function or is it sufficient enough to just edit that one line where the actual calculation is done? @Shai — , Jun 21 '17 at 11:36
you **MUST** change `backward()` as well: it must perform the gradient computation for the **specific** loss you are designing. Otherwise you backprop is meaningless. — Shai, Jun 21 '17 at 11:37
Okay, and how has the backward() function to be updated related to the loss calculation? I have given you an example for my loss calculation, but how would you update the backward computation? @Shai — , Jun 21 '17 at 11:42
your `other` is zero. it has to be a **scalar**. `bottom[0].diff` = d(other)/d(bottom[0])` — Shai, Jun 21 '17 at 11:54
What do you mean by is zero? My other is a subtraction of a sum --> scalar. So there should not be a problem? and what do you mean by d()? Could you edit / improve your question? Why do I not need the sign checking? @Shai — , Jun 21 '17 at 12:14
what is `shape` of `bottom[0].data`? what is `channelAxis`? if you `sum` only over `channelAxis` how do you expect to get a scalar? What with all other dimensions? — Shai, Jun 21 '17 at 12:18
Bottom[0].data has the same shape as for EuclideanLoss channelAxis is the axis for the channels. I assume that my bottom blob is of shape (height, width, channels)? What I basically want to do is to compute a loss of the difference of the sum of my channels: loss = (y1 = xg1 + xg2 + ... + xgn) - (y = xi1 + xi2 + ... + xin). Where xg1 = channel ground truth 1 and xi1 = channel result 1. Could you help me with that? @Shai — , Jun 21 '17 at 12:31
you forgot the batch size as the fourth dimension of the blob (aka `bottom[i].num`). — Shai, Jun 21 '17 at 12:36
just difference? your net will try to maximize `x` - the larger `sum(x)` the lower the diff would be (you can have `loss = -inf` in theory). does not sound like a good loss. — Shai, Jun 21 '17 at 12:38
So bottom[i].num = batch_size? and bottom[i].data = (height, width, channels)? @Shai — , Jun 21 '17 at 12:40
`bottom[i].data` is an `np.array` of *`shape`* `(num, channel, height, width)`. `bottom[i].data.shape[0] == bottom[i].num` — Shai, Jun 21 '17 at 12:42
Yes loss = sum(y1 - y2) / some_value. To make it smaller I will add some_value. Isn't that basically almost the same as EuclideanLoss, since I take the difference and divide by something? EuclideanLoss takes the difference and squares it and then divides it as well? @Shai — , Jun 21 '17 at 12:43
euclidean loss has zero as a lower bound, while your loss does not have a lower bound. This is quite a NOT the same. — Shai, Jun 21 '17 at 12:48
Okay, I have updated my question. Could you try and make it have a lower bound? I think you know what I want do do now, am I right? @Shai — , Jun 21 '17 at 12:49
What I basically want to do is: EuclideanLoss, but I want to reduce the channels to 1 by adding all the values up in the height and width dimension. DO you understand me? @Shai — , Jun 21 '17 at 13:43

Shai · Accepted Answer · 2017-06-22T07:38:09.417

2

Although it can be a very educating exercise to implement the loss you are after as a "Python" layer, you can get the same loss using existing layers. All you need is to add a "Reduction" layer for each of your blobs before calling the regular "EuclideanLoss" layer:

layer {
  type: "Reduction"
  name: "rx1"
  bottom: "x1"
  top: "rx1"
  reduction_param { axis: 1 operation: SUM }
} 
layer {
  type: "Reduction"
  name: "rx2"
  bottom: "x2"
  top: "rx2"
  reduction_param { axis: 1 operation: SUM }
} 
layer {
  type: "EuclideanLoss"
  name: "loss"
  bottom: "rx1"
  bottom: "rx2"
  top: "loss"
}

Update:
Based on your comment, if you only want to sum over the channel dimension and leave all other dimensions unchanged, you can use fixed 1x1 conv (as you suggested):

layer {
  type: "Convolution"
  name: "rx1"
  bottom: "x1"
  top: "rx1"
  param { lr_mult: 0 decay_mult: 0 } # make this layer *fixed*
  convolution_param {
    num_output: 1
    kernel_size: 1
    bias_term: 0  # no need for bias
    weight_filler: { type: "constant" value: 1 } # sum
  }
}

edited Jun 22 '17 at 07:38

answered Jun 21 '17 at 13:50

Shai

111,146
38
238
371

Okay, that is perfect! And now I can easily add weight_loss and have two EuclideanLosses am I right? – Jun 21 '17 at 13:53
@thigi exactly. getting to know existing layers can allow you to be quite lazy ;) – Shai Jun 21 '17 at 13:54
I have thought about the solution and it is wrong! It sums over all values rather than over all channels values. So but I want to create the sum like this: y = channel1 + channel2 + channel3 ... channelN. So the sum over the channels, rather than all axes. Could you help me with that? I think one can use a Convolution layer with `num_output = 1` and `weight_filler = constant, value = 1` would that be correct? Could you update your answer? – Jun 22 '17 at 07:22
[question](https://stackoverflow.com/questions/44693043/is-there-a-layer-that-is-able-to-sum-over-all-channels-in-caffe) – Jun 22 '17 at 07:36
Okay, thank you! What I do not understand, if the input lets say has 16 channels, how does the summation work? I thought caffe was treating each channel separately in convolution layers? Could you quickly extend your answer with a short expalanation? That would just be for a better understanding :) @Shai – Jun 22 '17 at 07:45

Trying to understand custom loss layer in caffe

1 Answers1