I am relatively new in Deep learning and its framework. Currently, I am experimenting with Caffe framework and trying to fine tune the Vgg16_places_365.
I am using the Amazone EC2 instance g2.8xlarge with 4 GPUs (each has 4 GB of RAM). However, when I try to train my model (using a single GPU), I got this error:
Check failed: error == cudaSuccess (2 vs. 0) out of memory
After I did some research, I found that one of the ways to solve this out of memory problem is by reducing the batch size in my train.prototxt
Caffe | Check failed: error == cudaSuccess (2 vs. 0) out of memory.
Initially, I set the batch size into 50, and iteratively reduced it until 10 (since it worked when batch_size = 10). Now, the model is being trained and I am pretty sure it will take quite long time. However, as a newcomer in this domain, I am curious about the relation between this batch size and another parameter such as the learning rate, stepsize and even the max iteration that we specify in the solver.prototxt.
How significant the size of the batch will affect the quality of the model (like accuracy may be). How the other parameters can be used to leverage the quality. Also, instead of reducing the batch size or scale up my machine, is there another way to fix this problem?