6

I've modified the Caffe MNIST example to classify 3 classes of image. One thing I noticed was that if I specify the number of output layers as 3, then my test accuracy drops horribly - down to the low 40% range. However, if I +1 and have 4 output layers, the result is in the 95% range.
I added an extra class of images to my dataset (so 4 classes) and noticed the same thing - if the number of output layers were the same as the number of classes, then the result was horrible, if it was the same +1, then it worked really well.

  inner_product_param {
    num_output: 3
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"

Does anyone know why this is? I've noticed that when I use the model I train with the C++ example code on an image from my test set then it will complain that I've told it that there are 4 classes present and I've only supplied labels for 3 in my labels file. If I invent a label and add it to the file, I can get the program to run, but then it just returns one of the classes with a probability of 1.0 no matter what image I give it.

Shai
  • 111,146
  • 38
  • 238
  • 371
Jack Simpson
  • 1,681
  • 3
  • 30
  • 54

1 Answers1

11

It is important to notice that when fine-tuning and/or changing the number of labels the input labels must always start from 0, as they are used as indices into the output probability vector when computing the loss.
Thus, if you have

 inner_product_params {
   num_output: 3
 }

You must have training labels 0,1 and 2 only.

If you use num_output: 3 with labels 1,2,3 caffe is unable to represent label 3 and in fact has a redundant line corresponding to label 0 that is left unused.
As you observed, when changing to num_output: 4 caffe is again able to represent label 3 and the results improved, but still you have an unused row in the parameters matrix.

Shai
  • 111,146
  • 38
  • 238
  • 371
  • is it also the case for regression? I have 4 classes in my regression problem, but i used only one output neuron. Is it ok? You can see my question on http://stackoverflow.com/questions/39756886/counting-with-regression-in-caffe-training-loss-reducing-but-predicted-values-r?noredirect=1#comment66825237_39756886 Can you kindly guide me whats the real problem? – khan Sep 29 '16 at 21:19