16

I wish to use a loss layer of type InfogainLoss in my model. But I am having difficulties defining it properly.

  1. Is there any tutorial/example on the usage of INFOGAIN_LOSS layer?

  2. Should the input to this layer, the class probabilities, be the output of a SOFTMAX layer, or is it enough to input the "top" of a fully connected layer?

INFOGAIN_LOSS requires three inputs: class probabilities, labels and the matrix H. The matrix H can be provided either as a layer parameters infogain_loss_param { source: "fiename" }.
Suppose I have a python script that computes H as a numpy.array of shape (L,L) with dtype='f4' (where L is the number of labels in my model).

  1. How can I convert my numpy.array into a binproto file that can be provided as a infogain_loss_param { source } to the model?

  2. Suppose I want H to be provided as the third input (bottom) to the loss layer (rather than as a model parameter). How can I do this?
    Do I define a new data layer which "top" is H? If so, wouldn't the data of this layer be incremented every training iteration like the training data is incremented? How can I define multiple unrelated input "data" layers, and how does caffe know to read from the training/testing "data" layer batch after batch, while from the H "data" layer it knows to read only once for all the training process?

Shai
  • 111,146
  • 38
  • 238
  • 371

3 Answers3

10

1. Is there any tutorial/example on the usage of InfogainLoss layer?:
A nice example can be found here: using InfogainLoss to tackle class imbalance.


2. Should the input to this layer, the class probabilities, be the output of a Softmax layer?
Historically, the answer used to be YES according to Yair's answer. The old implementation of "InfogainLoss" needed to be the output of "Softmax" layer or any other layer that makes sure the input values are in range [0..1].

The OP noticed that using "InfogainLoss" on top of "Softmax" layer can lead to numerical instability. His pull request, combining these two layers into a single one (much like "SoftmaxWithLoss" layer), was accepted and merged into the official Caffe repositories on 14/04/2017. The mathematics of this combined layer are given here.

The upgraded layer "look and feel" is exactly like the old one, apart from the fact that one no longer needs to explicitly pass the input through a "Softmax" layer.


3. How can I convert an numpy.array into a binproto file:

In python

H = np.eye( L, dtype = 'f4' ) 
import caffe
blob = caffe.io.array_to_blobproto( H.reshape( (1,1,L,L) ) )
with open( 'infogainH.binaryproto', 'wb' ) as f :
    f.write( blob.SerializeToString() )

Now you can add to the model prototext the INFOGAIN_LOSS layer with H as a parameter:

layer {
  bottom: "topOfPrevLayer"
  bottom: "label"
  top: "infoGainLoss"
  name: "infoGainLoss"
  type: "InfogainLoss"
  infogain_loss_param {
    source: "infogainH.binaryproto"
  }
}

4. How to load H as part of a DATA layer

Quoting Evan Shelhamer's post:

There's no way at present to make data layers load input at different rates. Every forward pass all data layers will advance. However, the constant H input could be done by making an input lmdb / leveldb / hdf5 file that is only H since the data layer will loop and keep loading the same H. This obviously wastes disk IO.

penelope
  • 8,251
  • 8
  • 45
  • 87
Shai
  • 111,146
  • 38
  • 238
  • 371
  • 1
    This solves parts 3 and 4 completely, and is a perfect solution for anyone who wants to re-weight (or balance) the per-class losses in Caffe, using InfoGain layer and the prototxt definition of H – AruniRC Apr 30 '18 at 22:43
2

The layer is summing up

-log(p_i)

and so the p_i's need to be in (0, 1] to make sense as a loss function (otherwise higher confidence scores will produce a higher loss). See the curve below for the values of log(p).

enter image description here

I don't think they have to sum up to 1, but passing them through a Softmax layer will achieve both properties.

Yair
  • 1,325
  • 2
  • 13
  • 18
  • it's not exactly this sum, there are also weights – Shai May 04 '15 at 19:34
  • Yes, there are the weights but they are not the factor that constrains the value of p_i. log(p) < 0 for 0 1 log(p) > 0 and so -log(p) will increase as p becomes smaller. This is the opposite of what the loss function should do. That is why p needs to be in (0, 1]. log(0) = -infinity so care should be taken not to have p=0, but that is already implemented in the layer itself. – Yair May 05 '15 at 13:24
  • So, if I understand correctly, your answer to question number 2 is: the input to `info_gain` layer should be the "top" of a `softmax` layer. – Shai May 05 '15 at 14:00
  • 2
    Yes. That would be a good way to be sure the values are in the correct range. – Yair May 18 '15 at 14:43
1

Since I had to search through many websites to puzzle the complete code, I thought I share my implementation:

Python layer for computing the H-matrix with weights for each class:

import numpy as np
import caffe


class ComputeH(caffe.Layer):
    def __init__(self, p_object, *args, **kwargs):
        super(ComputeH, self).__init__(p_object, *args, **kwargs)
        self.n_classes = -1

    def setup(self, bottom, top):
        if len(bottom) != 1:
            raise Exception("Need (only) one input to compute H matrix.")

        params = eval(self.param_str)
        if 'n_classes' in params:
            self.n_classes = int(params['n_classes'])
        else:
            raise Exception('The number of classes (n_classes) must be specified.')

    def reshape(self, bottom, top):
        top[0].reshape(1, 1, self.n_classes, self.n_classes)

    def forward(self, bottom, top):
        classes, cls_num = np.unique(bottom[0].data, return_counts=True)

        if np.size(classes) != self.n_classes or self.n_classes == -1:
            raise Exception("Invalid number of classes")

        cls_num = cls_num.astype(float)

        cls_num = cls_num.max() / cls_num
        weights = cls_num / np.sum(cls_num)

        top[0].data[...] = np.diag(weights)

    def backward(self, top, propagate_down, bottom):
        pass

and the relevant part from the train_val.prototxt:

layer {
    name: "computeH"
    bottom: "label"
    top: "H"
    type: "Python"
    python_param {
        module: "digits_python_layers"
        layer: "ComputeH"
        param_str: '{"n_classes": 7}'
    }
    exclude { stage: "deploy" }
}
layer {
  name: "loss"
  type: "InfogainLoss"
  bottom: "score"
  bottom: "label"
  bottom: "H"
  top: "loss"
  infogain_loss_param {
    axis: 1  # compute loss and probability along axis
  }
  loss_param {
      normalization: 0
  }
  exclude {
    stage: "deploy"
  }
}
  • I have 2 classes and my batch size for TRAIN/TEST is 16 and what if while sampling all my labels are same in that batch (i.e. from the majority class). This makes `np.unique(bottom[0].data, return_counts=True)` return `(array([1.], dtype=float32), array([16]))` which means my `num_classes` get assigned `1` instead of `2`. – Tanmay Agrawal Feb 14 '20 at 10:56
  • If you know the number of classes in advance, you can specify it as a constant. – Torsten Straßer Feb 18 '20 at 09:24