0

The following code ran without error on pytorch nightly (1.5.0.dev20200206), however once I installed the stable 1.5 build the RNN Forward method defined below started to throw an error:

def forward(self, sequence):
        print('Sequence shape:', sequence.shape)
        sequence = sequence.clone().view(len(sequence), 1, -1)
        print("flattened shape: ", sequence.shape)
        lstm_out, hidden = self.lstm(
            sequence, self.hidden
        )
        print(lstm_out.shape)
        out_space = self.hidden2out(lstm_out[:, -1])
        self.hidden = hidden
        print("hiddens")
        print(hidden[0].shape)
        print(hidden[1].shape)
        print(" out_space: ", out_space.shape)
        out_scores = torch.sigmoid(out_space)
        print("out_scores: ", out_scores.shape)
        out = out_scores.squeeze()
        print(out.shape)
        return out

I added the clone() function to prevent in place memory modifications from view() and made variable assignments obviously not in place. However I still get the following error:

Sequence shape: torch.Size([200, 19, 62])
flattened shape:  torch.Size([200, 1, 1178])
torch.Size([200, 1, 8])
hiddens
torch.Size([1, 1, 8])
torch.Size([1, 1, 8])
 out_space:  torch.Size([200, 1])
out_scores:  torch.Size([200, 1])
torch.Size([200])
Warning: Error detected in AddmmBackward. Traceback of forward call that caused the error:
  File "main.py", line 240, in <module>
    main_loop(args)
  File "main.py", line 115, in main_loop
    train.run(args)
  File "/data/learnedbloomfilter/python/classifier/train.py", line 519, in run
    args.log_every,
  File "/data/learnedbloomfilter/python/classifier/train.py", line 88, in train
    predictions = model(features)
  File "/data/miniconda3/envs/lbf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/learnedbloomfilter/python/classifier/embedding_lstm.py", line 65, in forward
    sequence, self.hidden
  File "/data/miniconda3/envs/lbf/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/data/miniconda3/envs/lbf/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 570, in forward
    self.dropout, self.training, self.bidirectional, self.batch_first)
 (print_stack at /opt/conda/conda-bld/pytorch_1587428190859/work/torch/csrc/autograd/python_anomaly_mode.cpp:60)
Traceback (most recent call last):
  File "main.py", line 240, in <module>
    main_loop(args)
  File "main.py", line 115, in main_loop
    train.run(args)
  File "/data/learnedbloomfilter/python/classifier/train.py", line 519, in run
    args.log_every,
  File "/data/learnedbloomfilter/python/classifier/train.py", line 97, in train
    loss.backward(retain_graph=True)
  File "/data/miniconda3/envs/lbf/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/data/miniconda3/envs/lbf/lib/python3.7/site-packages/torch/autograd/__init__.py", line 100, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [8, 32]], which is output 0 of TBackward, is at version 2; expected version 1 instead. Hint: the backtrace
further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I have isolated the error to forward(), but can't find the intermediate tensor [torch.FloatTensor [8, 32]] that seems to be causing the problem (none of the tensor shapes in my forward method match, so it must be in the lstm forward() method). I only use CPU, not cuda.

For the rest of the rnn code see this gist: https://gist.github.com/yaatehr/aac21cae05b24101f2369c97cfecb47b

Thanks!

yhr
  • 53
  • 2
  • 6
  • 1
    You seem to imply that the code was running w/o issues in another version of Pytorch; if this is indeed the case, please edit & update your question accordingly to explicitly clarify this. – desertnaut May 09 '20 at 17:32
  • 4
    Just a tip but I would avoid using `out = out_scores.squeeze()` since this will behave differently for batch size 1 and batch size > 1. Generally better to give the dimension you want to squeeze as an argument to avoid this behavior. – jodag May 09 '20 at 17:50
  • Good point, thank you @jodag – yhr May 10 '20 at 16:51
  • Update: I wasn't able to figure out the error, but if anyone finds themselves in a similar situation with the nightly builds [this](https://stackoverflow.com/questions/53639076/how-to-clone-an-old-python-conda-environment-when-links-to-packages-no-longer-w) helped me install the old package that was no longer available – yhr May 10 '20 at 21:42

1 Answers1

0

Post the full code of the model.

The error means that at some point a variable you are tracking gradients from is modified. In place operations are any pytorch operation with a _ (ie torch.add vs torch.add_. It could also be reassigning a variable at some point.

Karl
  • 961
  • 6
  • 10
  • I have clarified the question a bit, but the issue is that I am not using any methods with the underscore suffix, and the only other method I found that caused in-place memory modification was `view()`. To my understanding, cloning the tensor should prevent in-place memory errors on the `sequence` tensor. – yhr May 10 '20 at 16:58