I was trying to create a Transformer Encoder Block and in the forward()
function i used "+=" operator:
def forward(self, x):
x += self.msa_block(x)
x += self.mlp_block(x)
return x
Then i got an error message:
RuntimeError: one of the variables needed for gradient computation has been modified
by an inplace operation: [torch.FloatTensor [32, 197, 768]], which is output 0 of AddBackward0, is at version 24; expected version 23 instead.
Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
After some search and trial-error i found out that the forward()
function should be like this:
def forward(self, x):
x = self.msa_block(x) + x
x = self.mlp_block(x) + x
return x
As i understood, the problem was gradient computing(backward()
). My question is what caused gradient not compute?