11

I thought that batch size is only for performance. The bigger the batch, more images are computed at the same time to train my net. But I realized, if I change my batch size, my net accuracy gets better. So I did not understand what batch size is. Can someone explain me what is batch size?

Shai
  • 111,146
  • 38
  • 238
  • 371
Pasdf
  • 159
  • 2
  • 10
  • I far as I know, batch size is the size of images which is retrieved from the hard drive when the machine is doing computation stuff with the pre-fetched data. Through this technique, Caffe somehow tries to compensate for the read time from hard drive. – Saeed Nov 13 '15 at 14:48
  • But, if I change my batch size, I get a better accuracy. I did not understand it. – Pasdf Nov 13 '15 at 15:47

1 Answers1

19

Caffe is trained using Stochastic-Gradient-Descend (SGD): that is, at each iteration it computes the (stochastic) gradient of the parameters w.r.t the training data and makes a move (=change the parameters) in the direction of the gradient.
Now, if you write the equations of the gradient w.r.t. training data you'll notice that in order to compute the gradient exactly you need to evaluate all your training data at each iteration: this is prohibitively time consuming, especially when the training data gets bigger and bigger.
In order to overcome this, SGD approximates the exact gradient, in a stochastic manner, by sampling only a small portion of the training data at each iteration. This small portion is the batch.
Thus, the larger the batch size the more accurate the gradient estimate at each iteration.

TL;DR: batch size affect the accuracy of the estimated gradient at each iteration, changing the batch size therefore affect the "path" the optimization takes and may change the results of the training process.


Update:
In ICLR 2018 conference an interesting work was presented:
Samuel L. Smith, Pieter-Jan Kindermans, Chris Ying, Quoc V. Le Don't Decay the Learning Rate, Increase the Batch Size.
This work basically relates the effect of changing batch size and learning rate.

Shai
  • 111,146
  • 38
  • 238
  • 371
  • .@Shai - So `batch size` should be small or large? I run into `waiting for data` many time and it affects my training time for same `AlexNet` ran twice, one after other. – Chetan Arvind Patil Feb 01 '18 at 20:55