I want to scale the loss value of each image based on how close/far is the "current prediction" to the "correct label" during the training. For example if the correct label is "cat" and the network think it is "dog" the penalty (loss) should be less than the case if the network thinks it is a "car".
The way that I am doing is as following:
1- I defined a matrix of the distance between the labels,
2- pass that matrix as a bottom to the "softmaxWithLoss"
layer,
3- multiply each log(prob) to this value to scale the loss according to badness in forward_cpu
However I do not know what should I do in the backward_cpu
part. I understand the gradient (bottom_diff) has to be changed but not quite sure, how to incorporate the scale value here. According to the math I have to scale the gradient by the scale (because it is just an scale) but don't know how.
Also, seems like there is loosLayer in caffe called "InfoGainLoss"
that does very similar job if I am not mistaken, however the backward part of this layer is a little confusing:
bottom_diff[i * dim + j] = scale * infogain_mat[label * dim + j] / prob;
I am not sure why infogain_mat[]
is divide by prob
rather than being multiply by! If I use identity matrix for infogain_mat
isn't it supposed to act like softmax loss in both forward and backward?
It will be highly appreciated if someone can give me some pointers.