2

On my Windows 10, if I directly create a GPU tensor, I can successfully release its memory.

import torch
a = torch.zeros(300000000, dtype=torch.int8, device='cuda')
del a
torch.cuda.empty_cache()

But if I create a normal tensor and convert it to GPU tensor, I can no longer release its memory.

import torch
a = torch.zeros(300000000, dtype=torch.int8)
a.cuda()
del a
torch.cuda.empty_cache()

Why this is happening.

talonmies
  • 70,661
  • 34
  • 192
  • 269
John
  • 1,779
  • 3
  • 25
  • 53

5 Answers5

3

At least in Ubuntu, your script does not release memory when it is run in the interactive shell and works as expected when running as a script. I think there are some reference issues in the in-place call. The following will work in both the interactive shell and as a script.

import torch
a = torch.zeros(300000000, dtype=torch.int8)
a = a.cuda()
del a
torch.cuda.empty_cache()
hkchengrex
  • 4,361
  • 23
  • 33
1

Yes, this also happens on my pc with following configurations:

  • 20.04.1-Ubuntu
  • 1.7.1+cu110

According to information from fastai discussion:https://forums.fast.ai/t/gpu-memory-not-being-freed-after-training-is-over/10265/8

This is related to the python garbage collector in ipython environment.

def pretty_size(size):
    """Pretty prints a torch.Size object"""
    assert(isinstance(size, torch.Size))
    return " × ".join(map(str, size))

def dump_tensors(gpu_only=True):
    """Prints a list of the Tensors being tracked by the garbage collector."""
    import gc
    total_size = 0
    for obj in gc.get_objects():
        try:
            if torch.is_tensor(obj):
                if not gpu_only or obj.is_cuda:
                    print("%s:%s%s %s" % (type(obj).__name__, 
                                          " GPU" if obj.is_cuda else "",
                                          " pinned" if obj.is_pinned else "",
                                          pretty_size(obj.size())))
                    total_size += obj.numel()
            elif hasattr(obj, "data") and torch.is_tensor(obj.data):
                if not gpu_only or obj.is_cuda:
                    print("%s → %s:%s%s%s%s %s" % (type(obj).__name__, 
                                                   type(obj.data).__name__, 
                                                   " GPU" if obj.is_cuda else "",
                                                   " pinned" if obj.data.is_pinned else "",
                                                   " grad" if obj.requires_grad else "", 
                                                   " volatile" if obj.volatile else "",
                                                   pretty_size(obj.data.size())))
                    total_size += obj.data.numel()
        except Exception as e:
            pass        
    print("Total size:", total_size)

if I do something like

import torch as th
a = th.randn(10, 1000, 1000)
aa = a.cuda()
del aa
th.cuda.empty_cache()

you will not see any decrease in nvidia-smi/nvtop. But you can find out what is happening using handy function

dump_tensors()

and you may observe following informations:

Tensor: GPU pinned 10 × 1000 × 1000
Total size: 10000000

That means your gc still holds the resources.

One may refer to more discussions for python gc mechanism.

  1. Force garbage collection in Python to free memory
wstcegg
  • 141
  • 1
  • 9
1

I meet the same issue. Solution:

cuda = torch.device('cuda')
a.to(cuda)
Hey TV
  • 11
  • 1
1

To add up to the excellent answer from @wstcegg, what worked for me to clean my GPU cache on Ubuntu (did not work under windows) was using:

import gc
import torch
gc.collect()
torch.cuda.empty_cache()

You might also want to delete the elements you have created, see How can I explicitly free memory in Python?

For more details about garbage collection, see this good reference, which I am quoting the interesting part hereinafter

https://stackabuse.com/basics-of-memory-management-in-python/

Why Perform Manual Garbage Collection?

We know that the Python interpreter keeps a track of references to objects used in a program. In earlier versions of Python (until version 1.6), the Python interpreter used only the reference counting mechanism to handle memory. When the reference count drops to zero, the Python interpreter automatically frees the memory. This classical reference counting mechanism is very effective, except that it fails to work when the program has reference cycles. A reference cycle happens if one or more objects are referenced each other, and hence the reference count never reaches zero.

Let's consider an example.

>>> def create_cycle():
...     list = [8, 9, 10]
...     list.append(list)
...     return list
... 
>>> create_cycle()
[8, 9, 10, [...]]

The above code creates a reference cycle, where the object list refers to itself. Hence, the memory for the object list will not be freed automatically when the function returns. The reference cycle problem can't be solved by reference counting. However, this reference cycle problem can be solved by change the behavior of the garbage collector in your Python application.

To do so, we can use the gc.collect() function of the gc module.

import gc
n = gc.collect()
print("Number of unreachable objects collected by GC:", n)

The gc.collect() returns the number of objects it has collected and de-allocated.

There are two ways to perform manual garbage collection: time-based or event-based garbage collection.

Time-based garbage collection is pretty simple: the gc.collect() function is called after a fixed time interval.

Marine Galantin
  • 1,634
  • 1
  • 17
  • 28
-2

You should not use torch.cuda.empty_cache() as it it will slow down your code for no gain https://discuss.pytorch.org/t/what-is-torch-cuda-empty-cache-do-and-where-should-i-add-it/40975

Igor
  • 130
  • 1
  • 6