15

I thought we might be able to compile a Caffeinated description of some methods of performing multiple category classification.

By multi category classification I mean: The input data containing representations of multiple model output categories and/or simply being classifiable under multiple model output categories.

E.g. An image containing a cat & dog would output (ideally) ~1 for both the cat & dog prediction categories and ~0 for all others.

  1. Based on this paper, this stale and closed PR and this open PR, it seems caffe is perfectly capable of accepting labels. Is this correct?

  2. Would the construction of such a network require the use of multiple neuron (inner product -> relu -> inner product) and softmax layers as in page 13 of this paper; or does Caffe's ip & softmax presently support multiple label dimensions?

  3. When I'm passing my labels to the network which example would illustrate the correct approach (if not both)?:

    E.g. Cat eating apple Note: Python syntax, but I use the c++ source.

    Column 0 - Class is in input; Column 1 - Class is not in input

    [[1,0],  # Apple
     [0,1],  # Baseball
     [1,0],  # Cat
     [0,1]]  # Dog
    

    or

    Column 0 - Class is in input

    [[1],  # Apple
     [0],  # Baseball
     [1],  # Cat
     [0]]  # Dog
    

If anything lacks clarity please let me know and I will generate pictorial examples of the questions I'm trying to ask.

Shai
  • 111,146
  • 38
  • 238
  • 371
Aidan Gomez
  • 8,167
  • 5
  • 28
  • 51
  • 1
    do you mean a fixed number of labels per image, or a different number of labels for each image? that is, would you expect always to get, say, 2 labels, or would you expect for some images 2 labels and for others more? – Shai Oct 14 '15 at 05:51
  • 2
    @Shai the latter sounds like a really interesting problem! But in my case the label dimensions would be fixed. – Aidan Gomez Oct 14 '15 at 14:39

2 Answers2

9

Nice question. I believe there is no single "canonical" answer here and you may find several different approaches to tackle this problem. I'll do my best to show one possible way. It is slightly different than the question you asked, so I'll re-state the problem and suggest a solution.

The problem: given an input image and a set of C classes, indicate for each class if it is depicted in the image or not.

Inputs: in training time, inputs are pairs of image and a C-dim binary vector indicating for each class of the C classes if it is present in the image or not.

Output: given an image, output a C-dim binary vector (same as the second form suggested in your question).

Making caffe do the job: In order to make this work we need to modify the top layers of the net using a different loss.
But first, let's understand the usual way caffe is used and then look into the changes needed.
The way things are now: image is fed into the net, goes through conv/pooling/... layers and finally goes through an "InnerProduct" layer with C outputs. These C predictions goes into a "Softmax" layer that inhibits all but the most dominant class. Once a single class is highlighted "SoftmaxWithLoss" layer checks that the highlighted predicted class matches the ground truth class.

What you need: the problem with the existing approach is the "Softmax" layer that basically selects a single class. I suggest you replace it with a "Sigmoid" layer that maps each of the C outputs into an indicator whether this specific class is present in the image. For training, you should use "SigmoidCrossEntropyLoss" instead of the "SoftmaxWithloss" layer.

Shai
  • 111,146
  • 38
  • 238
  • 371
  • IMO, the reason that `Softmax` basically selects a single class is because `Softmax` does normalize the output into probability. But now caffe seems to already support a multi-label version of `Softmax` (as mentioned [here](http://stackoverflow.com/a/32697800/522000))? If so, what are the actually use cases for the multi-label version of `Softmax`? – mintaka Dec 17 '15 at 19:41
  • @mintaka the multi-label softmax you linked to is not the same as allowing multiple categories as in this case. – Shai Dec 17 '15 at 19:47
  • @mintaka the case here is that a **single** input can be labeled as several classes. In [this answer](http://stackoverflow.com/a/32697800/522000) the labeling is done **per-pixel**: that is each input is an image and each pixel in the image should get a label, but still it's a single label for single pixel, no pixel can be labeled as multiple classes. Do you see the difference? – Shai Dec 17 '15 at 20:54
  • Thanks, Shai. I understand the post here is talking about multi-label classification (i.e., an image belong to multiple classes). Sorry I didn't state my reply clearly. Previously, I thought there was no multi-label implementation of `SoftmaxWithLossLayer`, until I see that post (mentioned in my last reply). So I'm wondering if now `SoftmaxWithLossLayer` in caffe supports multi-label classification? – mintaka Dec 17 '15 at 22:15
  • Here (http://stackoverflow.com/questions/36538327/caffe-sigmoid-cross-entropy-loss) you said SigmoidCrossEntropyLoss includes a Sigmoid layer. So do we actually need a separate Sigmoid layer here? – user570593 Apr 22 '16 at 13:20
  • 2
    @user570593 use sigmoidcrossentropyloss for training and sigmoid for deplo – Shai Apr 22 '16 at 13:42
  • Could you give me an example prototxt file for binary classification with sigmoidcrossentropyloss ? Could you reply http://stackoverflow.com/questions/36795427/how-to-use-sigmoidcrossentropyloss-in-caffe-for-binary-class-classification? – user570593 Apr 22 '16 at 13:44
  • I used a fully connected layer following by sigmoidCrossEntropyLoss in caffe framework but sometimes I get every value depend on net configure. Is it ok? I wonder if loss must be between 0 and 1? – Somayyeh Ataei May 27 '18 at 14:32
0

Since one image can have multiple labels. The most intuitive way is to think this problem as a C independt binary classification problem where C is the total number of different classes. So it is easy to understand what @Shai have said:

add a "Sigmoid" layer that maps each of the C outputs into an indicator whether this specific class is present in the image, and should use "SigmoidCrossEntropyLoss" instead of the "SoftmaxWithloss" layer. The loss is the sum of these C SigmoidCrossEntropyLoss.

Community
  • 1
  • 1
kli_nlpr
  • 894
  • 2
  • 11
  • 25
  • When you say: add `Sigmoid`-layer you mean first a `Sigmoid`-layer and then a `SigmoidCrossEntropyLoss`-layer? This is related to the `train_val.prototxt` @Shai @kli_nlpr –  Nov 18 '16 at 13:01