50

I found this problem running a neural network on Colab Pro+ (with the high RAM option).

RuntimeError: CUDA out of memory. Tried to allocate 8.00 GiB (GPU 0; 15.90 GiB total capacity; 12.04 GiB already allocated; 2.72 GiB free; 12.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I have already decreased the batch size to 2. I upload the data using the h5py format.

At this point, I assume the only thing I can try is setting the max_split_size_mb. I could not find anything about how I can implement the max_split_size_mb. The PyTorch documentation was not clear to me.

talonmies
  • 70,661
  • 34
  • 192
  • 269
eli
  • 501
  • 1
  • 3
  • 5

4 Answers4

45

The max_split_size_mb configuration value can be set as an environment variable.

The exact syntax is documented at https://pytorch.org/docs/stable/notes/cuda.html#memory-management, but in short:

The behavior of caching allocator can be controlled via environment variable PYTORCH_CUDA_ALLOC_CONF. The format is PYTORCH_CUDA_ALLOC_CONF=<option>:<value>,<option2>:<value2>...

Available options:

  • max_split_size_mb prevents the allocator from splitting blocks larger than this size (in MB). This can help prevent fragmentation and may allow some borderline workloads to complete without running out of memory. Performance cost can range from ‘zero’ to ‘substatial’ depending on allocation patterns. Default value is unlimited, i.e. all blocks can be split. The memory_stats() and memory_summary() methods are useful for tuning. This option should be used as a last resort for a workload that is aborting due to ‘out of memory’ and showing a large amount of inactive split blocks.

...

So, you should be able to set an environment variable in a manner similar to the following:

Windows: set 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512'

Linux: export 'PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512'

This will depend on what OS you're using - in your case, for Google Colab, you might find Setting environment variables in Google Colab helpful.

dakota
  • 771
  • 1
  • 7
  • 15
  • I played with several values for the `mb` size but couldn't get it to work (I have an old 1080ti), but passing in this option did the trick: `--n_samples 1` – zenw0lf Mar 25 '23 at 04:06
  • Being curious, what's the default value of this option? – NeoZoom.lua Jul 20 '23 at 15:59
  • 1
    @zenw0lf Passing in `--n_samples 1` to what? – Zorgoth Aug 04 '23 at 18:13
  • @Zorgoth To the command that you run that throws the OOM exception. In my case was to the Stable Diffusion command as seen here: https://github.com/CompVis/stable-diffusion – zenw0lf Aug 30 '23 at 14:42
19

Adding to the other answer, the size to be used really depends on the numbers in the error message, but if you're running python,

import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:<enter-size-here>"

at the start of the script has been found to work sometimes for me. Try different sizes.

Benjamin Breton
  • 1,388
  • 1
  • 13
  • 42
chakrr
  • 329
  • 3
  • 9
2

Another option to try:

torch.cuda.empty_cache() 

if you want to empty the cache allocated.

Dharma
  • 2,425
  • 3
  • 26
  • 40
0

For anyone using SD with a GTX 1660 or other 16XX 6GB card, this option is not actually required when using the latest Nvidia drivers. When using version 531, which makes SD much faster, --medvram will be required though.

andreszs
  • 2,896
  • 3
  • 26
  • 34