0

I recently ran my CNN under various batch sizes and noticed that the smaller the batch sizes(32, 64), the higher the accuracy but the graphs looked like this:

Loss graph

Accuracy graph

Can anyone explain why the graphs don't look normal? My training data has 4096 features. Here are my graphs for my larger batch sizes(512, 1024):

enter image description here

enter image description here

  • In my experience a smaller batch_size usually performs better than a bigger batch_size. – yudhiesh Feb 19 '21 at 05:38
  • 1
    Can you explain why you mean by "graphs don't look normal"? – NotAName Feb 19 '21 at 05:42
  • I looked it up and according to this page https://stackoverflow.com/questions/46654424/how-to-calculate-optimal-batch-size#answer-46656508:~:text=Max%20batch%20size%3D%20available%20GPU%20memory%20bytes%20%2F%204%20%2F%20(size%20of%20tensors%20%2B%20trainable%20parameters) the larger the batch size the better. –  Feb 19 '21 at 05:50
  • For all of the CNN's I trained, all of their loss and accuracy graphs had the shape of graph with the large batch size. I have never seen the graph shape of the smaller batch size so I am sort of skeptical. –  Feb 19 '21 at 05:51

1 Answers1

0

Ideally (according to classic gradient descent method) you should use one batch (the whole of your dataset). But it is too slow and your dataset might not fit into memory. So we use approximation of gradient (Stochastic gradient descent method) - by splitting dataset by batches (see here - https://en.wikipedia.org/wiki/Stochastic_gradient_descent).

So the bigger batch - the better approximation is.

To see the difference you have to compare by the number of steps (not by epochs): the more batch size - the less steps per epoch. Now you've got accuracy of 19% in 55 epochs with big batches and in 50 epochs with small batches. Which is similar. But in the first case you've done 16 times more steps, which took much more time (up to 16 times).

Another important point - you can use higher learning rate with big batches, which can further improve training time. In your case - you can increase learning rate by the order of 4.

Andrey
  • 5,932
  • 3
  • 17
  • 35