Why the `Train net output` loss and `iteration loss` are the same during training with caffe?

Question

I am training AlexNet on my own data using caffe. One of the issues I see is that the "Train net output" loss and "iteration loss" are nearly the same in the training process. Moreover, this loss has fluctuations. like:

...
...Iteration 900, loss 0.649719
...    Train net output #0: loss = 0.649719 (* 1 = 0.649719 loss ) 
...    Iteration 900, lr = 0.001
...Iteration 1000, loss 0.892498
...    Train net output #0: loss = 0.892498 (* 1 = 0.892498 loss ) 
...    Iteration 1000, lr = 0.001
...Iteration 1100, loss 0.550938
...    Train net output #0: loss = 0.550944 (* 1 = 0.550944 loss ) 
...    Iteration 1100, lr = 0.001
...

should I see this fluctuation?
As you see the difference between reported losses are not significant. Does it show a problem in my training?

my solver is:

net: "/train_val.prototxt"
test_iter: 1999
test_interval: 10441
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 100000
display: 100
max_iter: 208820
momentum: 0.9
weight_decay: 0.0005
snapshot: 10441
snapshot_prefix: "/caffe_alexnet_train"
solver_mode: GPU

related: http://stackoverflow.com/q/31840488/1714410 – Shai Oct 18 '16 at 05:22 — Shai, Oct 18 '16 at 05:22

score 5 · Answer 1 · answered Oct 18 '16 at 05:21

Caffe uses Stochastic Gradient Descent (SGD) method for training the net. In the long run, the loss decreases, however, locally, it is perfectly normal for the loss to fluctuate a bit.
The reported "iteration loss" is the weighted sum of all loss layers of your net, averaged over average_loss iterations. On the other hand, the reported "train net output..." reports each net output from the current iteration only.
In your example, you did not set average_loss in your 'solver', and thus average_loss=1 by default. Since you only have one loss output with loss_weight=1 the reported "train net output..." and "iteration loss" are the same (up to display precision).

To conclude: your output is perfectly normal.

Thanks a lot @Shai for the answer. Could you please tell me how I can define that `average_loss` in my `solver` too? I checked it in `googlenet` and it was `like average_loss: 40`. Is it an initial value for that? — user6726469, Oct 18 '16 at 10:51
@user6726469 the default value is 1. I usually set it to the same value of `display` parameter. It's up to you to decide what interval to average. — Shai, Oct 18 '16 at 10:53

Why the `Train net output` loss and `iteration loss` are the same during training with caffe?

1 Answers1