63

I am trying to get the output of a neural network which I have already trained. The input is an image of the size 300x300. I am using a batch size of 1, but I still get a CUDA error: out of memory error after I have successfully got the output for 25 images.

I tried torch.cuda.empty_cache(), but this still doesn't seem to solve the problem. Code:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

train_x = torch.tensor(train_x, dtype=torch.float32).view(-1, 1, 300, 300)
train_x = train_x.to(device)
dataloader = torch.utils.data.DataLoader(train_x, batch_size=1, shuffle=False)

right = []
for i, left in enumerate(dataloader):
    print(i)
    temp = model(left).view(-1, 1, 300, 300)
    right.append(temp.to('cpu'))
    del temp
    torch.cuda.empty_cache()

This for loop runs for 25 times every time before giving the memory error.

Every time, I am sending a new image in the network for computation. So, I don't really need to store the previous computation results in the GPU after every iteration in the loop. Is there any way to achieve this?

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
ntd
  • 2,052
  • 3
  • 13
  • 19

2 Answers2

105

I figured out where I was going wrong. I am posting the solution as an answer for others who might be struggling with the same problem.

Basically, what PyTorch does is that it creates a computational graph whenever I pass the data through my network and stores the computations on the GPU memory, in case I want to calculate the gradient during backpropagation. But since I only wanted to perform a forward propagation, I simply needed to specify torch.no_grad() for my model.

Thus, the for loop in my code could be rewritten as:

for i, left in enumerate(dataloader):
    print(i)
    with torch.no_grad():
        temp = model(left).view(-1, 1, 300, 300)
    right.append(temp.to('cpu'))
    del temp
    torch.cuda.empty_cache()

Specifying no_grad() to my model tells PyTorch that I don't want to store any previous computations, thus freeing my GPU space.

ntd
  • 2,052
  • 3
  • 13
  • 19
  • 3
    That's interesting. Does changing the mode of model (from train to eval) helps? I am wondering if there's an y internal mechanism that automatically tells pytorch that the mode has been changed to eval so no need to save the computations? That means I can do validation and inference with 'with torch.no_grad()', if the net.eval() does not explicitly tell pytorch to not save the computations during forward pass? – samra irshad Aug 03 '20 at 03:06
  • 1
    In order to do the inference (just the forward pass), you only need to specify net.eval() which would disable your dropout and batchnorm layers putting the model in evaluation mode. However, it is highly recommended to also use it with torch.no_grad() since it would disable the autograd engine (which you probably don't want during inference), and this would save you both time and memory. Doing only net.eval() would still compute the gradients making it slow and consuming your memory. – ntd Aug 03 '20 at 04:22
  • If I send the data tensors (lets say predictions and groundtruth) to cpu() through .numpy().cpu(), do I still need to mention 'with torch.no_grad()'? – samra irshad Aug 03 '20 at 06:54
  • If your variable has `requires_grad=True`, then you cannot directly call .numpy(). You will first have to do .detach() to tell pytorch that you do not want to compute gradients for that variable. Next, if your variable is on GPU, you will first need to send it to CPU in order to convert to numpy with .cpu(). Thus, it will be something like `var.detach().cpu().numpy()`. – ntd Aug 03 '20 at 18:31
  • 1
    But with torch.no_grad(), you will not need to mention .detach() since the gradients are not being computed anyway. – ntd Aug 03 '20 at 18:34
16

Answering exactly the question How to clear CUDA memory in PyTorch. In google colab I tried torch.cuda.empty_cache(). But it didn't help me. And using this code really helped me to flush GPU:

import gc
torch.cuda.empty_cache()
gc.collect()

This issue may help.

Alex
  • 316
  • 2
  • 5