4

I dont know the exact meaning of 'iter_size' in caffe solver though I googled a lot. it always says that 'iter_size' is a way to effectively increase the batch size without requiring the extra GPU memory.

Could I understand it as this:

If set bach_size=10,iter_size=10, its behavior is the same as batch_size = 100.

But I do two tests on this:

  1. total samples = 6. batch_size = 6, iter_size = 1, trainning samples and test samples are the same. the loss and accuracy graph :

loss

  1. total samples = 6. batch_size = 1, iter_size = 6, trainning samples and test samples are the same.

loss

from this two tests, you can see that it behaves very differently.

So I misunderstood the true meaning of 'iter_size'. But how can I do to get the behavior of gradient descent the same as over all samples rather than mini_batch?
Could anybody give me some help?

Shai
  • 111,146
  • 38
  • 238
  • 371
spider
  • 909
  • 1
  • 11
  • 19
  • good question. I suspect `iter_size` might be used only for training and not for testing. Can you repeat the second experiment with `batch_size: 1` and `iter_size: 6` only for training and leave `batch_size: 6` for test? I would expect the train-test graphs to coincide after this change. – Shai Aug 18 '17 at 08:42
  • 1
    sorry for non-clarification about the parameters in the testing phase. For all the two experiments, batch_size is 6 in the testing phase,and test_iter is 1.The second experiment just conducted with batch_size:1 and iter_size:6. So I expect the two experiments should output the same loss and accuracy, but I was disappointed a lot@Shai – spider Aug 21 '17 at 09:32

0 Answers0