7

I am trying to train a neural network for a very large input (5*100,000,000) and it requires much more memory than expected. Here is some minimal example:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import time

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv1d(in_channels=5, out_channels=1, kernel_size=100000000, stride=10)

    def forward(self, x):
        x = self.conv1(x)
        x = torch.sigmoid(x)
        return x

model = Net().cuda()

optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = torch.nn.BCELoss()

data = torch.normal(torch.zeros(1,5,100000000),torch.ones(1,5,100000000))
data = data.cuda()
label = torch.ones(1,1,1)
label = label.cuda()

for epoch in range(10):
    output = model(data)
    loss = criterion(output, label)
   
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    print("Epoch :", epoch)

The input is some random data, it uses approximately 2Gb, as expected (32 bit * 5 * 100,000,000= 1.86Gb).This variable has no gradient. The network consists of a single convolutional layer with one filter of the same size as an input, so it has 500M weights, that is another 2Gb. After the forward pass another 2Gb get used. After loss.backprop() 8Gb are used, after optimizer.step() 12 Gb are used, that is all the available memory.

During the second epoch forward pass runs ok, but during backpropagation I get RuntimeError: CUDA error: out of memory.

What exactly is saved in GPU memory during the epoch? Why the memory is not released after the optimization step is finished? How to reduce memory usage in this case?

UPD: Looks like my problems are similar to this issue https://discuss.pytorch.org/t/how-to-free-gpu-memory-and-delete-memory-allocated-variables/20856

UPD2: Got an answer from pytorch developers here https://github.com/pytorch/pytorch/issues/12651 , but it just says that it is not a pytorch, but cuDNN issue.

talonmies
  • 70,661
  • 34
  • 192
  • 269
Olha Romaniuk
  • 71
  • 1
  • 5
  • Can you not allocate a specific percentage of memory that is allowed to be used at one time with pytorch? I ask because I have done this in tensorflow. Aside from using some of the methods in torch.cuda like empty_cache() I am not too sure. – nagrom97 Oct 12 '18 at 13:02
  • Looks like pytorch cannot have limit on GPU memory usage https://stackoverflow.com/questions/49529372/force-gpu-memory-limit-in-pytorch. I've tried empty_cache, it had no effect – Olha Romaniuk Oct 13 '18 at 07:18

1 Answers1

-1

Since you want to call loss.backprop(), PyTorch has to calculate the gradients, and this contributes to the large memory allocation. If you want to drop gradients call .detach() on a variable.

To release unused memory, you can call torch.cuda.empty_cache() If you want to dive into the details, the CUDA semantics page may be a starting point.

randomwalker
  • 1,573
  • 1
  • 9
  • 14
  • empty_cache() doesn't work. As far as I understood, you suggest to detach variables when I don't need their gradients any more, am I right? I've tried this, but it had no effect – Olha Romaniuk Oct 13 '18 at 07:27
  • It's been a while that I dealed with a very similar problem, and I don't exactly remember the solution. – randomwalker Oct 14 '18 at 14:18
  • Honestly, it was more about try-and-error than about seriously understanding the way cuda memory is allocated by pytorch. Unfortunately, I don't have a computer around roght now to try out some things. You can either detach or delete a tensor, depending on the situation. Calling empty_cache() after this operation should free the memory. May I ask why you use convolution? Applying convolution with a kernel size equal to the input size is more or less the same as applying e.g. nn.Linear (i.e. a fully connected layer), except for side effects such as padding. – randomwalker Oct 14 '18 at 14:29
  • The example in the question is purely artificial designed to show the issue I am dealing with. I've tried linear layer equivalent to convolution, but it runs fine with such input, even increasing the input size doesn't cause the same behaviour as convolution, side effects are a good idea to explore. And I really want to understand where the memory is going. The input is 2gb, weights are 2 gb, 2gb more for gradients, that is 6gb, what is using the rest 6gb and can we somehow get rid of some parts of it? – Olha Romaniuk Oct 14 '18 at 17:41
  • Have you tried a smaller input such as 0.2 gb? I am asking because there may be a memory leak. If cuda runs out of memory after 10+ epochs even with such a small input, this would be a hint for sth like this. Otherwise, we have to further explore the implementation of the convolution class. – randomwalker Oct 15 '18 at 06:47
  • 1
    I've tried even better test, looks like it is bug, I've reported it here https://github.com/pytorch/pytorch/issues/12651 Maybe someone will explain this behavior – Olha Romaniuk Oct 15 '18 at 14:22