When I am training using keras + tensorflow-gpu, and I set the batch_size to 128, which is the max size the gpu could accept, otherwise, there's OOM problem. My question is when the batch_size is 128, the pics size is 128*224*224*3*4(the img size is 224*224, in RGB channel), total is around 10M Bytes, which I think is too small compared to the memory of GPU. Is there any explanation for it?
Asked
Active
Viewed 163 times
2 Answers
1
You are forgetting 3
more things which also require GPU memory.
Your Model weights.
Temporary variables during calculation of gradients.
These two take up a huge chunk of memory.
This is why even though your batch consumes 10M
.
- There are so many other minute things that require GPU memory.

BattleTested_закалённый в бою
- 4,037
- 5
- 26
- 47

Vikas NS
- 408
- 5
- 19
-
1great explanation – Xingyu Gu Jul 18 '18 at 19:07
1
The image is uint where is tensor is float64 which increases size by eight times. Forward path, gradients, and other tensors use a significant chunk of memory.
You can compute memory required for your model as given here

Mohbat Tharani
- 550
- 1
- 6
- 22