Although there is a problem with the accuracy of floating-point multiplication, the gap is slightly larger. And it is also related to the roll step.
x = torch.rand((1, 5))
y = torch.rand((5, 1))
print("%.10f"%torch.matmul(x,y))
>>> 1.2710412741
print("%.10f"%torch.matmul(torch.roll(x, 1, 1), torch.roll(y, 1, 0)))
>>> 1.2710412741
print("%.10f"%torch.matmul(torch.roll(x, 2, 1), torch.roll(y, 2, 0)))
>>> 1.2710413933
What results in the problem above? How can i get a more consistent result?