0

I have a fully connected layer with a SoftmaxWithLoss layer. What I am trying to do is to retrieve the data in form of 3D rather than 1D. My input ground_truth images are 3x128x128 and my last layers look like this:

layer {
  name: "fc1"
  type: "InnerProduct"
  bottom: "conv"
  top: "fc1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 49152 # 128 x 128 x 3
     ...
  }
}

layer {
  name: "result"
  type: "SoftmaxWithLoss"
  bottom: "fc1"
  bottom: "label"
  top: "result"
}

And here I get the following error:

softmax_loss_layer.cpp:47] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (1 vs. 49152) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be NHW, with integer values in {0, 1, ..., C-1}.

What is wrong here? I have my label which is 3x128x128 and my output_num is 49152 = 3 x 128 x128??

My follow up question would be how to transform this 1D data into 3D data:

I am using the python API for caffe. I know I "just" have to reshape the 1D vector to a 3D vector. But how do I know where to "reshape" as in which location in the 1D vector corresponds to the location in the 3D vector. Can anyone help me?

Thanks in advance!

1 Answers1

1

Since you are looking for a pixel-wise classification, and the label being the ground truth image, it would be better to use Eucledian loss layer instead of Softmax with loss. Softmax is normally used for multiclass classification. It might be possible to use softmax in your case too, but with changing the format of the labels etc.

While performing the Eucledian loss, caffe will consider the labels as a 1D array, same is applied to the predicted output too. 3D to 1D conversion will have Width-major format, followed by Height and then Channels.

Your model will learn to predict an output with the format similar to the labels. ie, if the channels in the labels are inverted, the model will end up to learn the inverted form. This is because the previous layer is a fully connected layer.

You can read more about Softmax here.

Similar answer which may help you with a details explanation, here.

Community
  • 1
  • 1
Anoop K. Prabhu
  • 5,417
  • 2
  • 26
  • 43
  • Thank you for your answer. The reason why I have answered this question is because I was not able to produce well enough results with the regression task as decribed [link](http://stackoverflow.com/questions/40588551/caffe-how-to-convert-network-from-pixel-wise-segmentation-to-pixel-wise-regress?noredirect=1#comment68451095_40588551) here. In the future I need to have a 3D output, i.e. at least 2 channels x height x width. Could you have a look at that question? That would be very appreciated –  Nov 15 '16 at 13:44
  • What I do not understand is what you mean by changing the format of the labels? It will be the same for regression and softmax where I have an image with values from [0, 255], what would be different? –  Nov 15 '16 at 20:34
  • What would be the difference between using a `convolutional` layer as last layer rather than `innerProduct`. Because when I use a `convolutional` layer I will get a 2D output which is perfect since I do not have to reshape it. –  Dec 06 '16 at 12:00