Ideally (according to classic gradient descent method) you should use one batch (the whole of your dataset). But it is too slow and your dataset might not fit into memory. So we use approximation of gradient (Stochastic gradient descent method) - by splitting dataset by batches (see here - https://en.wikipedia.org/wiki/Stochastic_gradient_descent).
So the bigger batch - the better approximation is.
To see the difference you have to compare by the number of steps (not by epochs): the more batch size - the less steps per epoch. Now you've got accuracy of 19% in 55 epochs with big batches and in 50 epochs with small batches. Which is similar. But in the first case you've done 16 times more steps, which took much more time (up to 16 times).
Another important point - you can use higher learning rate with big batches, which can further improve training time. In your case - you can increase learning rate by the order of 4.