1

I am using Caffe CNN for regression (see figure bellow).

enter image description here

The values I want to predict are of very different ranges, e.g. y1=[0.1:0.2], y2=[1:5],.. yn=[0:15].

Q1: if I try to predict the 'y's as they are, would it mess up the learning? and if yes, why? (i already did this experiment and the results are ok, but not good)

Q2: can I set 'y's =[0:1] by doing sum(ys)=1?

Q3: can I use other loss function e.g. Softmax or Logistic, or Euclidean is my only option?

Nima
  • 433
  • 2
  • 8

1 Answers1

1

I don't have answers for you, but I can offer some observations.

If I understand your settings correctly, you have a loss function for each output y_i. Each loss is a regression loss forcing y_i in a particular range.

1. Since your outputs are "pulling" to different ranges, this might cause the weight matrix of the last layer to have very different scales for different rows. If you are using a regularizer (like L2) this may "confuse" the learning process trying to make the weight roughly isotropic.
To overcome this, you can either relax the regularization on the last layer's weight (using decay_mult parameter). Alternatively, you can add a "Scale" layer to learn only a scale factor (and maybe bias as well) for each output.

2. I don't understand what you are trying to accomplish by this. Are you trying to bound the outputs? You can get bounded outputs by applying "Sigmoid" or "Tanh" activation to each output, forcing each to [0..1] or [-1..1] range. (You can add "Scale" layer after the activation).

3. You can use logistic regression for each of the outputs, or explore smooth L1 loss (which should be more robust, especially if targets are not in range [-1..1]).

Shai
  • 111,146
  • 38
  • 238
  • 371
  • thanks for your answer @shai. No, I do not have a separate loss for each output. I have an Euclidean(Y, O); Y = [y1, y2,..,Yn], and ground truth output O=[o1,o2,..,on]. is this a problem? – Nima May 09 '18 at 07:51
  • @NimaHatami I suppose it's equivalent. On the other hand, having different loss per output allows you to have different loss_weight per output, thus "focussing" on the problematic components – Shai May 09 '18 at 07:54
  • answer for 2. yes, I am trying to bound outputs [0,1]. i thought different ranges could be avoided by this. – Nima May 09 '18 at 08:00