Yes, this also happens on my pc with following configurations:
- 20.04.1-Ubuntu
- 1.7.1+cu110
According to information from fastai discussion:https://forums.fast.ai/t/gpu-memory-not-being-freed-after-training-is-over/10265/8
This is related to the python garbage collector in ipython environment.
def pretty_size(size):
"""Pretty prints a torch.Size object"""
assert(isinstance(size, torch.Size))
return " × ".join(map(str, size))
def dump_tensors(gpu_only=True):
"""Prints a list of the Tensors being tracked by the garbage collector."""
import gc
total_size = 0
for obj in gc.get_objects():
try:
if torch.is_tensor(obj):
if not gpu_only or obj.is_cuda:
print("%s:%s%s %s" % (type(obj).__name__,
" GPU" if obj.is_cuda else "",
" pinned" if obj.is_pinned else "",
pretty_size(obj.size())))
total_size += obj.numel()
elif hasattr(obj, "data") and torch.is_tensor(obj.data):
if not gpu_only or obj.is_cuda:
print("%s → %s:%s%s%s%s %s" % (type(obj).__name__,
type(obj.data).__name__,
" GPU" if obj.is_cuda else "",
" pinned" if obj.data.is_pinned else "",
" grad" if obj.requires_grad else "",
" volatile" if obj.volatile else "",
pretty_size(obj.data.size())))
total_size += obj.data.numel()
except Exception as e:
pass
print("Total size:", total_size)
if I do something like
import torch as th
a = th.randn(10, 1000, 1000)
aa = a.cuda()
del aa
th.cuda.empty_cache()
you will not see any decrease in nvidia-smi/nvtop.
But you can find out what is happening using handy function
dump_tensors()
and you may observe following informations:
Tensor: GPU pinned 10 × 1000 × 1000
Total size: 10000000
That means your gc still holds the resources.
One may refer to more discussions for python gc mechanism.
- Force garbage collection in Python to free memory