Cap tensorflow batch size based on GPU memory

Question

I am switching to training on GPUs and found that with arbitrary, and not very big, batch size of training will crash. With 256x256 RGB images in a UNET, a batch of 32 causes an out of memory crash, while 16 works successfully. The amount of memory consumed was surprising as I never ran into an out-of-memory on a 16 GB RAM system. Is tensorflow free to use SWAP?

How can I check the amount of total memory available on a GPU? Many guides online only look at memory used.

How does one estimate the memory needs? Image size (pixelschannelsdtype)* batch + parameter size * float?

Many thanks, Bogdan

Its GPU VRAM you are looking for not System RAM. VRAM is distinct for the type of GPU you use, you can also look it up with `nvidia-smi` if you use ubuntu and in the taskmanager for windows. Im pretty sure you are missing space for gradients in your formula. — Eumel, Sep 27 '22 at 12:01
@Eumel So, nvidia-smi is required? No way to use tensorflow tools? I use taskmanager currently but need to be able to get it programmatically to limit the batch size for an arbitrary system. For the size calculation - my image size for a 32 batch is 1 gig, the saved model is <1 gig. Surprised that a batch of 32 exceeds the 12 gig of GPU memory. Is the model in training really 10x bigger than the saved model? — illan, Sep 27 '22 at 15:40
10x is a bit on the high side. But remember how deep learning work: you run a forward inference, store gradients, calculate an error, and backpropagate the error using the gradient. You need a gradient for each weight. And if your 1G model is an INT8 model, it needed 2*FP32 per weight in training (since you round it to INT8 afterwards). So 8x is certainly possible — MSalters, Sep 27 '22 at 16:10
@MSalters I use a UNET with model.save(). I always assumed it saves a float point without rounding. So, the model in memory would be weights + foward gradients+backprop gradients - maybe 3x the size of the saved .hd. Is this wrong? Does hd do high compression? — illan, Sep 27 '22 at 18:58
depending on the optimizer you might need multiple gradients and .hd only saves weights, not your whole feauture vector. also simple google search: https://stackoverflow.com/questions/59567226/how-to-programmatically-determine-available-gpu-memory-with-tensorflow — Eumel, Sep 28 '22 at 07:36

Cap tensorflow batch size based on GPU memory

0 Answers0