Why the CUDA memory is not release with torch.cuda.empty_cache()

Question

On my Windows 10, if I directly create a GPU tensor, I can successfully release its memory.

import torch
a = torch.zeros(300000000, dtype=torch.int8, device='cuda')
del a
torch.cuda.empty_cache()

But if I create a normal tensor and convert it to GPU tensor, I can no longer release its memory.

import torch
a = torch.zeros(300000000, dtype=torch.int8)
a.cuda()
del a
torch.cuda.empty_cache()

Why this is happening.

score 3 · Accepted Answer · answered Sep 08 '20 at 06:28

At least in Ubuntu, your script does not release memory when it is run in the interactive shell and works as expected when running as a script. I think there are some reference issues in the in-place call. The following will work in both the interactive shell and as a script.

import torch
a = torch.zeros(300000000, dtype=torch.int8)
a = a.cuda()
del a
torch.cuda.empty_cache()

score 1 · Answer 2 · answered Mar 26 '21 at 06:33

Yes, this also happens on my pc with following configurations:

20.04.1-Ubuntu
1.7.1+cu110

According to information from fastai discussion:https://forums.fast.ai/t/gpu-memory-not-being-freed-after-training-is-over/10265/8

This is related to the python garbage collector in ipython environment.

def pretty_size(size):
    """Pretty prints a torch.Size object"""
    assert(isinstance(size, torch.Size))
    return " × ".join(map(str, size))

def dump_tensors(gpu_only=True):
    """Prints a list of the Tensors being tracked by the garbage collector."""
    import gc
    total_size = 0
    for obj in gc.get_objects():
        try:
            if torch.is_tensor(obj):
                if not gpu_only or obj.is_cuda:
                    print("%s:%s%s %s" % (type(obj).__name__, 
                                          " GPU" if obj.is_cuda else "",
                                          " pinned" if obj.is_pinned else "",
                                          pretty_size(obj.size())))
                    total_size += obj.numel()
            elif hasattr(obj, "data") and torch.is_tensor(obj.data):
                if not gpu_only or obj.is_cuda:
                    print("%s → %s:%s%s%s%s %s" % (type(obj).__name__, 
                                                   type(obj.data).__name__, 
                                                   " GPU" if obj.is_cuda else "",
                                                   " pinned" if obj.data.is_pinned else "",
                                                   " grad" if obj.requires_grad else "", 
                                                   " volatile" if obj.volatile else "",
                                                   pretty_size(obj.data.size())))
                    total_size += obj.data.numel()
        except Exception as e:
            pass        
    print("Total size:", total_size)

if I do something like

import torch as th
a = th.randn(10, 1000, 1000)
aa = a.cuda()
del aa
th.cuda.empty_cache()

you will not see any decrease in nvidia-smi/nvtop. But you can find out what is happening using handy function

dump_tensors()

and you may observe following informations:

Tensor: GPU pinned 10 × 1000 × 1000
Total size: 10000000

That means your gc still holds the resources.

One may refer to more discussions for python gc mechanism.

Force garbage collection in Python to free memory

score 1 · Answer 3 · answered Jan 12 '22 at 10:58

1

I meet the same issue. Solution:

cuda = torch.device('cuda')
a.to(cuda)

answered Jan 12 '22 at 10:58

Hey TV

11
1

Marine Galantin · Answer 4 · 2022-09-08T12:17:51.480

To add up to the excellent answer from @wstcegg, what worked for me to clean my GPU cache on Ubuntu (did not work under windows) was using:

import gc
import torch
gc.collect()
torch.cuda.empty_cache()

You might also want to delete the elements you have created, see How can I explicitly free memory in Python?

For more details about garbage collection, see this good reference, which I am quoting the interesting part hereinafter

https://stackabuse.com/basics-of-memory-management-in-python/

Why Perform Manual Garbage Collection?

We know that the Python interpreter keeps a track of references to objects used in a program. In earlier versions of Python (until version 1.6), the Python interpreter used only the reference counting mechanism to handle memory. When the reference count drops to zero, the Python interpreter automatically frees the memory. This classical reference counting mechanism is very effective, except that it fails to work when the program has reference cycles. A reference cycle happens if one or more objects are referenced each other, and hence the reference count never reaches zero.

Let's consider an example.
>>> def create_cycle():
...     list = [8, 9, 10]
...     list.append(list)
...     return list
... 
>>> create_cycle()
[8, 9, 10, [...]]
The above code creates a reference cycle, where the object list refers to itself. Hence, the memory for the object list will not be freed automatically when the function returns. The reference cycle problem can't be solved by reference counting. However, this reference cycle problem can be solved by change the behavior of the garbage collector in your Python application.

To do so, we can use the gc.collect() function of the gc module.
import gc
n = gc.collect()
print("Number of unreachable objects collected by GC:", n)
The gc.collect() returns the number of objects it has collected and de-allocated.

There are two ways to perform manual garbage collection: time-based or event-based garbage collection.

Time-based garbage collection is pretty simple: the gc.collect() function is called after a fixed time interval.

score -2 · Answer 5 · answered Jan 17 '22 at 14:10

-2

You should not use torch.cuda.empty_cache() as it it will slow down your code for no gain https://discuss.pytorch.org/t/what-is-torch-cuda-empty-cache-do-and-where-should-i-add-it/40975

answered Jan 17 '22 at 14:10

Igor

130
1
6

1

this doesn't answer the question. people generally do it for testing on smaller GPUs. – Prakhar Sharma Sep 28 '22 at 15:25

Why the CUDA memory is not release with torch.cuda.empty_cache()

5 Answers5