Linear Loss and Accuracy CNN graph

Question

I recently ran my CNN under various batch sizes and noticed that the smaller the batch sizes(32, 64), the higher the accuracy but the graphs looked like this:

Can anyone explain why the graphs don't look normal? My training data has 4096 features. Here are my graphs for my larger batch sizes(512, 1024):

In my experience a smaller batch_size usually performs better than a bigger batch_size. — yudhiesh, Feb 19 '21 at 05:38
I looked it up and according to this page https://stackoverflow.com/questions/46654424/how-to-calculate-optimal-batch-size#answer-46656508:~:text=Max%20batch%20size%3D%20available%20GPU%20memory%20bytes%20%2F%204%20%2F%20(size%20of%20tensors%20%2B%20trainable%20parameters) the larger the batch size the better. — , Feb 19 '21 at 05:50
For all of the CNN's I trained, all of their loss and accuracy graphs had the shape of graph with the large batch size. I have never seen the graph shape of the smaller batch size so I am sort of skeptical. — , Feb 19 '21 at 05:51

Andrey · Accepted Answer · 2021-02-19T10:10:50.123

Ideally (according to classic gradient descent method) you should use one batch (the whole of your dataset). But it is too slow and your dataset might not fit into memory. So we use approximation of gradient (Stochastic gradient descent method) - by splitting dataset by batches (see here - https://en.wikipedia.org/wiki/Stochastic_gradient_descent).

So the bigger batch - the better approximation is.

To see the difference you have to compare by the number of steps (not by epochs): the more batch size - the less steps per epoch. Now you've got accuracy of 19% in 55 epochs with big batches and in 50 epochs with small batches. Which is similar. But in the first case you've done 16 times more steps, which took much more time (up to 16 times).

Another important point - you can use higher learning rate with big batches, which can further improve training time. In your case - you can increase learning rate by the order of 4.

So if my default learning rate is 0.01 should i change it to 0.04? — , Feb 19 '21 at 10:57
Just ran my model, the higher learning rate had a better accuracy but lower precision. It also ran slightly faster. — , Feb 19 '21 at 22:46

Linear Loss and Accuracy CNN graph

1 Answers1