How should I do with the loss output?

Asked Jun 04 '17 at 11:28

Active Jun 04 '17 at 15:43

Viewed 78 times

I trained my network in a dataset. I got the training loss/iterations as follows:

As you see, the loss grow up rapidly at some points as the red arrow. I am using Adam solver with learning rate is 0.001 and momentum as 0.9, weight decay is 0.0005, without dropout. In my network, I used BatchNorm, Pooling, Conv. From the above figure. Could you suggest what is my problem and how to fix it? Thanks all

Update: This is more detail figure

edited Jun 04 '17 at 15:43

asked Jun 04 '17 at 11:28

John

2,838
7
36
65

what is your [learning rate policy](https://stackoverflow.com/questions/30033096/what-is-lr-policy-in-caffe/30045244#30045244)? – Shai Jun 04 '17 at 11:42
what is the scale of your loss and iterations? So we can actually see how big the spike is... – Thomas Wagenaar Jun 04 '17 at 11:48
Sorry for reply late. I am using Adam method, so learning rate policy will be fixed. My scale is 1000 in x-axis and 0.2 in y-axis – John Jun 04 '17 at 12:30
@Shai: Do you agree with me that learning rate policy must be fixed in Adam method? – John Jun 04 '17 at 13:00
1

@user8264 I came across instances that used `step` policy with `"Adam"` solvers... – Shai Jun 04 '17 at 13:01
In terms of training epochs, when does the spike happens? After how many epochs? or is it during the first one? – Shai Jun 04 '17 at 13:02
I am using Caffe, I guess about 1-2 epoch – John Jun 04 '17 at 13:04
Could you show me some reference link that used step policy for Adam? – John Jun 04 '17 at 13:05
can you be more accurate about the number of epochs? have you shuffled the training set? – Shai Jun 04 '17 at 13:08
Yes, I tried more but it same thing is happen. Actually, I am using Caffe so it only provides the iterations instead of the epoch. I used shuffled in training – John Jun 04 '17 at 13:13
if you know the size of your training set and the training batch size, you can easily convert iterations to epochs. – Shai Jun 04 '17 at 13:18
I am using the batch size of 4 and number of the training set is 80.000 images. So, 1 epoch=32k iterations. Is it right? – John Jun 04 '17 at 13:23
if you process 4 samples/iteration and you have 80k samples, then 1 epoch = 80k/4=20k iterations. After how many iterationdoes the loss "spikes"? – Shai Jun 04 '17 at 15:20
Thanks. In my figure, sometimes it happens after 10k iterations. Sometimes happens after 5k iterations. I don't know what is happening. It is totally random – John Jun 04 '17 at 15:38
I updated it in the question. Let see it – John Jun 04 '17 at 15:43
@Shai: Do you guess what is my issue? – John Jun 05 '17 at 11:46
it seems like it all happens during the first epoch. If you shuffle the training set again, do you see the same phenomenon? – Shai Jun 05 '17 at 11:48
I am using shuffle in the TRAIN phase. `hdf5_data_param { source: "./list.txt" batch_size: 8 shuffle: true }` – John Jun 05 '17 at 11:50
One more thing I want to mention here. If I used momentum as 0.99, then I have not found the spikes point but the performance is less than the momentum of 0.9. I guess the momentum 0.99 convergers at local minimal – John Jun 05 '17 at 12:02

How should I do with the loss output?

0 Answers0