RuntimeError: one of the variables needed for gradient computation has been modified: is at version 2; expected version 1 instead

Question

I'm trying the following Kaggle.
TL;DR: I want to classify a sequence (time-series) of measurements to 1 of K classes using LSTM.
I'm trying to overfit the model on 2 sequences:
My input is (B, N, M):

B : batch-size = 1
N : sequence-size = 128
M : num-of-feature = 14 (number of measurements in each timestamp) My model is a very simple LSTM:

class LSTMClassifier(nn.Module):
    def __init__(self, in_dim, hidden_dim, out_dim, num_layers):
        super(LSTMClassifier, self).__init__()
        self.in_dim = in_dim
        self.hidden_dim = hidden_dim
        self.out_dim = out_dim
        self.num_layers = num_layers
        
        self.lstm = nn.LSTM(in_dim, hidden_dim, num_layers=num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_dim, out_dim)
        
    def forward(self, x):
        lstm_out, (ht, ct) = self.lstm(x)
        y = self.fc(ht[-1].reshape(-1, self.hidden_dim))
        return y

And the train process is:

def train_lstm_model(model, data_loader, num_epochs, loss_cls, optimizer_cls, learning_rate):
    start = time.time()
    
    loss = loss_cls()
    optimizer = optimizer_cls(model.parameters(), lr=learning_rate) 
    
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model.to(device)

    for epoch in tqdm(range(num_epochs)):
        hidden = (torch.zeros((1, data_loader.batch_size, model.hidden_dim), requires_grad=True).to(device), 
                    torch.zeros(1, data_loader.batch_size, model.hidden_dim, requires_grad=True).to(device))
        for i, (batch_x, batch_y) in enumerate(data_loader):
            batch_x = batch_x.to(device).float()
            batch_y = batch_y.to(device).long()
            
            optimizer.zero_grad()
            
            y_predicted, hidden = model(batch_x, hidden)
            l = loss(y_predicted, batch_y)

            l.backward()
            optimizer.step()

#            print(f'epoch {epoch+1}, batch {i+1}: loss = {l.item()} |',
#                  f'train accuracy: {eval_lstm_model(model, data_loader.dataset, hidden)}')
                

    end = time.time()
    print(f'Training took {end-start} seconds.')

And my setup code is:

loss_cls = nn.CrossEntropyLoss
optimizer_cls = torch.optim.SGD
hidden_dim = 100

model_lstm = LSTMClassifier(X_of.shape[-1], hidden_dim, len(np.unique(y_train)))

learning_rate = 0.01
num_epochs = 1000
train_lstm_model(model_lstm, overfit_loader, num_epochs, loss_cls, optimizer_cls, learning_rate)

The overfit_loader is a DataLoader which contains only 2 samples.

But the training process outputs the following error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-87-5f725d0ecc50> in <module>
     27 learning_rate = 0.001
     28 num_epochs = 100
---> 29 train_lstm_model(model_lstm, overfit_loader, num_epochs, loss_cls, optimizer_cls, learning_rate)

<ipython-input-86-ba60b3627f13> in train_lstm_model(model, data_loader, num_epochs, loss_cls, optimizer_cls, learning_rate, test_loader)
     20             l = loss(y_predicted, batch_y)
     21 
---> 22             l.backward(retain_graph=True)
     23             optimizer.step()
     24 

/usr/local/lib64/python3.6/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    219                 retain_graph=retain_graph,
    220                 create_graph=create_graph)
--> 221         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    222 
    223     def register_hook(self, hook):

/usr/local/lib64/python3.6/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
    130     Variable._execution_engine.run_backward(
    131         tensors, grad_tensors_, retain_graph, create_graph,
--> 132         allow_unreachable=True)  # allow_unreachable flag
    133 
    134 

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace  
operation: [torch.cuda.FloatTensor [400, 14]] is at version 2; expected version 1 instead. Hint: the  
backtrace further above shows the operation that failed to compute its gradient. The variable in question  
was changed in there or anywhere later. Good luck!

EDIT: I've removed the loss printing and stop re-using the hidden, according to @SzymonMaszke comment, and the exception gone, but there's still a problem that the loss isn't converges below 0.7

I'd like to get some help please, Thanks!

May I ask why you are retaining the graph when back propagating? — Ivan, Dec 27 '20 at 22:00
Also, could you try removing `loss` calculation (and subsequent `print`) and try it out? Also, if this doesn't help, could you try, in your `LSTMClassifier` changing this line: `x, hidden = self.lstm(x, hidden)` to `_, (ht, ct) = self.hidden(x)` and use `ht` in the return, so `return self.fc(ht.reshape(-1, self.hidden_dim)), (ht, ct)`. — Szymon Maszke, Dec 27 '20 at 22:24
@Ivan because before i've added this line i got ```RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.``` — Gil Ben David, Dec 28 '20 at 07:09
@SzymonMaszke I've changed the code and the error gone, but the loss isn't get down below 0.7 and the accuracy is 0.5 for just 2 samples after 100 epochs... — Gil Ben David, Dec 28 '20 at 08:55
@GilBenDavid go according to [this answer](https://stackoverflow.com/questions/54411662/lstm-autoencoder-always-returns-the-average-of-the-input-sequence/54480128). You have a lot of errors if you are backpropagating through graph for the second time (you shouldn't do that). You didn't post your `loss` function either (there is a functional version of each loss, you don't need `loss_cls`...). Why are you using hidden and passing it to the model constantly? Are samples between batches dependent? — Szymon Maszke, Dec 28 '20 at 11:26
@SzymonMaszke you're right, I shouldn't keep the hidden. I'm using the `nn.CrossEntropyLoss`. I've followed the answer you cited, and still the loss ain't getting below 0.7. — Gil Ben David, Dec 29 '20 at 08:33
Ok, something even more weird just happens...I try to just sample random data and the net overfit in just 2 epochs. It is possible that the data is so damaged that the net can't overfit to just 2 samples? — Gil Ben David, Dec 29 '20 at 09:07
@GilBenDavid Go with `hidden_dim` of at least `100` and check the results for overfitting. Increase if needed, no, it's not possible if the neural network has enough capacity it will be able to fit even to noise. — Szymon Maszke, Dec 29 '20 at 12:01

RuntimeError: one of the variables needed for gradient computation has been modified: is at version 2; expected version 1 instead

0 Answers0