2

I am finetuning a network. In a specific case I want to use it for regression, which works. In another case, I want to use it for classification.

For both cases I have an HDF5 file, with a label. With regression, this is just a 1-by-1 numpy array that contains a float. I thought I could use the same label for classification, after changing my EuclideanLoss layer to SoftmaxLoss. However, then I get a negative loss as so:

    Iteration 19200, loss = -118232
    Train net output #0: loss = 39.3188 (* 1 = 39.3188 loss)

Can you explain if, and so what, goes wrong? I do see that the training loss is about 40 (which is still terrible), but does the network still train? The negative loss just keeps on getting more negative.

UPDATE
After reading Shai's comment and answer, I have made the following changes:
- I made the num_output of my last fully connected layer 6, as I have 6 labels (used to be 1).
- I now create a one-hot vector and pass that as a label into my HDF5 dataset as follows

    f['label'] = numpy.array([1, 0, 0, 0, 0, 0])        

Trying to run my network now returns

   Check failed: hdf_blobs_[i]->shape(0) == num (6 vs. 1)       

After some research online, I reshaped the vector to a 1x6 vector. This lead to the following error:

  Check failed: outer_num_ * inner_num_ == bottom[1]->count() (40 vs. 240) 
   Number of labels must match number of predictions; e.g., if softmax axis == 1 
   and prediction shape is (N, C, H, W), label count (number of labels) 
   must be N*H*W, with integer values in {0, 1, ..., C-1}.

My idea is to add 1 label per data set (image) and in my train.prototxt I create batches. Shouldn't this create the correct batch size?

Community
  • 1
  • 1
Cassie
  • 362
  • 4
  • 15

1 Answers1

2

Since you moved from regression to classification, you need to output not a scalar to compare with "label" but rather a probability vector of length num-labels to compare with the discrete class "label". You need to change num_output parameter of the layer before "SoftmaxWithLoss" from 1 to num-labels.

I believe currently you are accessing un-initialized memory and I would expect caffe to crash sooner or later in this case.

Update:
You made two changes: num_output 1-->6, and you also changed your input label from a scalar to vector.
The first change was the only one you needed for using "SoftmaxWithLossLayer".
Do not change label from a scalar to a "hot-vector".

Why?
Because "SoftmaxWithLoss" basically looks at the 6-vector prediction you output, interpret the ground-truth label as index and looks at -log(p[label]): the closer p[label] is to 1 (i.e., you predicted high probability for the expected class) the lower the loss. Making a prediction p[label] close to zero (i.e., you incorrectly predicted low probability for the expected class) then the loss grows fast.


Using a "hot-vector" as ground-truth input label, may give rise to multi-category classification (does not seems like the task you are trying to solve here). You may find this SO thread relevant to that particular case.

Community
  • 1
  • 1
Shai
  • 111,146
  • 38
  • 238
  • 371
  • Thanks for the suggestions! Strangely enough, caffe didn't crash. I updated my question with how I understood your suggestions and the results! – Cassie Dec 08 '16 at 09:16
  • Thank you! My program behaves more like I expected! My Iteration loss is still negative, however. Any ideas what causes this? – Cassie Dec 08 '16 at 10:09
  • 1
    @user4039874 using `"SoftmaxWithLoss"` should never result with negative values. something fishy is going on. (1) are you sure you have no other loss layers in your model? (2) are you sure all `label` values in HDF5 are *integers* with values `{0,1,2,3,4,5}` only? is it possible you have some examples with `label==6`? – Shai Dec 08 '16 at 10:17
  • 1
    Silly of me. My labels didn't start at 0. I changed that and now the model is training. Thanks again! – Cassie Dec 08 '16 at 10:29
  • @user4039874 now the question is why didn't you get a segmentation fault? is it possible your caffe is compiled with debug flags? – Shai Dec 08 '16 at 10:31
  • That is very much possible. My caffe runs on a server and I had to try several configurations to get it to install and make it usable. – Cassie Dec 08 '16 at 10:45
  • 1
    @user4039874 in that case you might not get optimal performance (in term of run time and mem usage) from your caffe... – Shai Dec 08 '16 at 10:48
  • 1
    For this case, that is not a problem. But I will definitely keep track of it in the future! – Cassie Dec 08 '16 at 11:11
  • Hi Shai, I have same problem at https://stackoverflow.com/questions/46280783/video-classification-using-hdf5-in-caffe. Could you look at it? – John Sep 18 '17 at 13:42