1

The error message is as follows:CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 23.65 GiB total capacity; 21.91 GiB already allocated; 25.56 MiB free; 22.62 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max split size mb to avoid fragmentation. See documentation for Memory Management and PYTORCH CUDA ALLOC CONF. I have already tried reducing batch sizes and optimizing my code, but the issue persists. I would like to know how to address this problem and prevent the out of memory error. Additionally, I am unsure how to set the "max split size mb" parameter mentioned in the error message

Any guidance or suggestions on resolving this issue would be greatly appreciated. Thank you in advance!

I have already tried reducing batch sizes and optimizing my code, but the issue persists . Additionally, I am unsure how to set the "max split size mb" parameter mentioned in the error message.

Nick ODell
  • 15,465
  • 3
  • 32
  • 66

1 Answers1

0

you could use the approach detailed in this blog to find the needed batch size. Essentially you can manually do the same by starting with your current batch size and halving it each time till you get the code to run.

If you are using Google Colab, I have found that restarting my runtime / clearing the GPU between runs of different batch size also prevents OOM errors even if the batch size is small enough to fit on the GPU. You can use code like:

import torch, gc
gc.collect()
torch.cuda.empty_cache() 

You should also be sure that your GPU is big enough to fit your model. Find the size of your model (source):

model = models.resnet18()    # replace with your model
param_size = 0
for param in model.parameters():
    param_size += param.nelement() * param.element_size()
buffer_size = 0
for buffer in model.buffers():
    buffer_size += buffer.nelement() * buffer.element_size()

size_all_mb = (param_size + buffer_size) / 1024**2
print('model size: {:.3f}MB'.format(size_all_mb))
Priyank
  • 1,513
  • 1
  • 18
  • 36