Doubts when changing the SoftMaxWithLoss layer of caffe framework

Question

I want to modify the existing softmaxloss in Caffe. The idea is to add a weight factor to the loss. For instance, if we are processing a pixel that belongs to car class, I want to put a factor 2 to the loss, because in my case, the detection of car class is more important than the dog class(for example). This is the original source code:

__global__ void SoftmaxLossForwardGPU(const int nthreads,
          const Dtype* prob_data, const Dtype* label, Dtype* loss,
          const int num, const int dim, const int spatial_dim,
          const bool has_ignore_label_, const int ignore_label_,
          Dtype* counts) {
  CUDA_KERNEL_LOOP(index, nthreads) {
    const int n = index / spatial_dim;
    const int s = index % spatial_dim;
    const int label_value = static_cast<int>(label[n * spatial_dim + s]);
    if (has_ignore_label_ && label_value == ignore_label_) {
      loss[index] = 0;
      counts[index] = 0;
    } else {
      loss[index] = -log(max(prob_data[n * dim + label_value * spatial_dim + s],
                      Dtype(FLT_MIN)));
      counts[index] = 1;
    }
  }
}

You can find this code in https://github.com/BVLC/caffe/blob/master/src/caffe/layers/softmax_loss_layer.cu

In the following code you can find the modifications that I do in order to achieve my objective:

__global__ void SoftmaxLossForwardGPU(const int nthreads,
          const Dtype* prob_data, const Dtype* label, Dtype* loss,
          const int num, const int dim, const int spatial_dim,
          const bool has_ignore_label_, const int ignore_label_,
          Dtype* counts) {
  const float weights[4]={3.0, 1.0, 1.0, 0.5}
  CUDA_KERNEL_LOOP(index, nthreads) {
    const int n = index / spatial_dim;
    const int s = index % spatial_dim;
    const int label_value = static_cast<int>(label[n * spatial_dim + s]);
    if (has_ignore_label_ && label_value == ignore_label_) {
      loss[index] = 0;
      counts[index] = 0;
    } else {
      loss[index] = -log(max(prob_data[n * dim + label_value * spatial_dim + s],
                      Dtype(FLT_MIN))) * weights[label_value];
      counts[index] = 1;
    }
  }
}

I am not sure if this modification is doing what I want to do. For several reasons:

I am not sure what means each values of this function. I am supposing for instance the label_value corresponds to the ground truth value, but I am not sure.
I completely do not understand this line: prob_data[n * dim + label_value * spatial_dim + s]. Where is the loss estimated here? I am supposing the loss calculation is happening in this line, and for that reason I'm putting my weights here, but I can't see the calculation here. Here I can see an access to a specific position of the vector prob_dat.

I know my code proposal is not the best one, I would like at some point to convert these weights into an input of the layer, but right now I don't have enough knowledge to do it (if you can also give me some hints in order to achieve it, that would be great).

Please remember that requesting for an off-site resource such as an article is off-topic on Stack Overflow. You are advised to remove or rephrase point 1. — E_net4, Aug 28 '17 at 16:57

score 2 · Answer 1 · answered Aug 28 '17 at 10:31

2

Implementing your own layer in caffe is a very nice skill to have, but you should do this as a "last resort". There are quite a few existing layers and you can usually achieve what you want using existing layer(s).
You cannot modify the forward_gpu implementation without modifying forward_cpu as well. More importantly, you MUST modify the backward functions as well - otherwise the gradients updating your weights will not reflect your modified loss.
"SoftmaxWithLoss" layer is a special case of the loss "InfogainLoss" layer. If you want to have different "weight" for each class, you can simply use "InfogainLoss" with weight matrix H according to your weights.
If you want to have spatially varying weight (different weight for different location) you can look at PR #5828, implementing "WeightedSoftmaxWithLoss".

answered Aug 28 '17 at 10:31

Shai

111,146
38
238
371

Hi Shai, Thanks again for your answer. The thing is: the weight for each class is only the first step of my experiments. For this reason I decided to do my own network based on SoftMaxWithLoss. Related to the InfogainLoss is the same loss formula that is used in softmaxwithloss, right? Where can I find a theoretical explanation of that? The first thing I need to know is how is working the loss implemented in SoftMaxWithLoss. I read in some places that this loss is the same of cross entropy. But I am not sure about this fact. – Dhorka Aug 28 '17 at 12:42
Another question that I have is related to the modification of the backward function. Why the gradient updating my weights will not reflect the current loss? – Dhorka Aug 28 '17 at 12:43
@Dhorka see [this thread](https://stackoverflow.com/a/34917052/1714410) for more information regarding the connection between softmax loss and infogain loss – Shai Aug 28 '17 at 12:47

Doubts when changing the SoftMaxWithLoss layer of caffe framework

1 Answers1