how to choose batch size in caffe

Question

I understand that bigger batch size gives more accurate results from here. But I'm not sure which batch size is "good enough". I guess bigger batch sizes will always be better but it seems like at a certain point you will only get a slight improvement in accuracy for every increase in batch size. Is there a heuristic or a rule of thumb on finding the optimal batch size?

Currently, I have 40000 training data and 10000 test data. My batch size is the default which is 256 for training and 50 for the test. I am using NVIDIA GTX 1080 which has 8Gigs of memory.

score 2 · Accepted Answer · answered Apr 30 '17 at 05:41

2

Test-time batch size does not affect accuracy, you should set it to be the largest you can fit into memory so that validation step will take shorter time.

As for train-time batch size, you are right that larger batches yield more stable training. However, having larger batches will slow training significantly. Moreover, you will have less backprop updates per epoch. So you do not want to have batch size too large. Using default values is usually a good strategy.

answered Apr 30 '17 at 05:41

Shai

111,146
38
238
371

1

Gotcha. Thank you. – MoneyBall Apr 30 '17 at 06:46
I think a useful range could be `[number of labels, batch size your memory can hold]`. – nomem Apr 30 '17 at 11:01
@lnman for imagenet number of labels is 1000… I don't think this is reasonable – Shai Apr 30 '17 at 11:11
Yes I know. That's why I said useful range and `memory can hold` part. – nomem Apr 30 '17 at 11:14
@lnman suppose you have a very large memory, would you set 'batch_size:1000'? I don't think so. I think 256 is a very *large* batch size to begin with. From my *limited* experience, I think ~50 is more like a normal size. – Shai Apr 30 '17 at 11:28
As `mini-batch` is an estimator of the `batch` it surely helps if you use all type of examples for each batch to converge. So why not use `1000` if you have enough memory? For recent gpus large batch size can be computed efficiently and parallelized over many gpus and will have less variance. – nomem Apr 30 '17 at 12:18
@lnman recently there are several publications working successfully with very large batch sizes. So I guess my tendency towards smaller batches is not a good practice – Shai Sep 30 '17 at 22:14

score 2 · Answer 2 · answered Aug 01 '17 at 06:28

See my masters thesis, page 59 for some of the reasons why to choose a bigger batch size / smaller batch size. You want to look at

epochs until convergence
time per epoch: higher is better
resulting model quality: lower is better (in my experiments)

A batch size of 32 was good for my datasets / models / training algorithm.

how to choose batch size in caffe

2 Answers2