3

When I am training using keras + tensorflow-gpu, and I set the batch_size to 128, which is the max size the gpu could accept, otherwise, there's OOM problem. My question is when the batch_size is 128, the pics size is 128*224*224*3*4(the img size is 224*224, in RGB channel), total is around 10M Bytes, which I think is too small compared to the memory of GPU. Is there any explanation for it?

Xingyu Gu
  • 33
  • 3

2 Answers2

1

You are forgetting 3 more things which also require GPU memory.

  1. Your Model weights.

  2. Temporary variables during calculation of gradients.

These two take up a huge chunk of memory. This is why even though your batch consumes 10M.

  1. There are so many other minute things that require GPU memory.
Vikas NS
  • 408
  • 5
  • 19
1

The image is uint where is tensor is float64 which increases size by eight times. Forward path, gradients, and other tensors use a significant chunk of memory.

You can compute memory required for your model as given here

Mohbat Tharani
  • 550
  • 1
  • 6
  • 22