I have been trying to debug a certain model that uses torch.einsum
operator in a layer which is repeated a couple of times.
While trying to analyze the GPU memory usage of the model during training, I have noticed that a certain Einsum operation dramatically increases the memory usage. I am dealing with multi-dimensional matrices. The operation is torch.einsum('b q f n, b f n d -> b q f d', A, B)
.
It is also worth mentioning that:
x
was assigned before to a tensor of the same shape.- In every layer (they are all identical), the GPU memory is linearly increases) after this operation, and does not deallocate until the end of the model iteration.
I have been wondering why this operation uses so much memory, and why the memory stays allocated after every iteration over that layer type.