Pytorch Tensor storages have the same id when calling the storage() method

Question

I'm learning about tensor storage through a blog (in my native language - Viet), and after experimenting with the examples, I found something that was difficult to understand. Given 3 tensors x, zzz, and x_t as below:

import torch

x = torch.tensor([[3, 1, 2],
                  [4, 1, 7]])
zzz = torch.tensor([1,2,3])
# Transpose of the tensor x 
x_t = x.t()

When I set the storage of each tensor to the corresponding variable, then their ids are different from each other:

x_storage = x.storage()
x_t_storage = x_t.storage()
zzz_storage = zzz.storage()
print(id(x_storage), id(x_t_storage), id(zzz_storage)) 
print(x_storage.data_ptr())   
print(x_t_storage.data_ptr())

Output:

140372837772176 140372837682304 140372837768560
94914110126336
94914110126336

But when I called the storage() method on each original tensor in the same print statement, the same outputs are observed from all tensors, no matter how many times I tried:

print(id(x.storage()), id(x_t.storage()), id(zzz.storage()))
# 140372837967904 140372837967904 140372837967904

The situation gets even weirder as I print them separately on different lines; sometimes their results are different and sometimes theirs are the same:

print(id(x.storage()))
print(id(x_t.storage()))
# Output: 
# 140372837771776
# 140372837709856

So my question is, why are there differences between the id of the storages in the first case, and the same id is observed in the second? (and where did that id come from?). And what is happening in the third case?

Also, I want to ask about the method data_ptr(), as it was suggested to be used instead of id in one question I saw on Pytorch discuss, but the Docs in Pytorch just show no more detail. I would be glad if anyone can give me detailed answers to any/all of the questions.

score 2 · Answer 1 · answered Mar 06 '23 at 14:13

"The python id() returns the identity of an object. This id is unique and constant for this object during its lifetime." from the python docu

Now in this case:

print(id(x.storage()), id(x_t.storage()), id(zzz.storage()))

the x.storage(), x_t.storage() and zzz.storage() only exist for the duration of the id() call. After that, those are garbage collected. In other words, the id of x.storage() is freed up and could then be assigned to x_t.storage() and so on.

If you try the following:

print(x.storage() is x_t.storage())

it will return False. Since in this case the x.storage(), x_t.storage() are garbage collected once the print is done, it is not possible that both have the same ids within the print call, because their lifetimes overlap.

Now, I think due to some CPython implementation relying on the memory address to assign ids, you get the outputs the same in the case:

print(id(x.storage()), id(x_t.storage()), id(zzz.storage()))

However, this does not indicate that you have the same object at hand as illustrated before.

Here is another example for the same purpose:

class DummyClass:
    def dummyfunc(self):
        print(f"Illustration purpose")

If you test the following:

x = DummyClass()
print(id(x.dummyfunc()), id(x.dummyfunc()))

It will as well return the same id here for the same reasons discussed before.

Now that we have the id explained, let's have a look at data_ptr(). data_ptr() returns the address of the first element of self tensor as stated in the official docu.

Please note that id does not return an object's address memory per default, but a unique object identifier. Although, the CPython implementation uses the memory address somehow (I am not aware of the details). This, however, should not be your reference for the general case

In order to check if two tensors have the same data, you could use the following:

tensor_var.storage().data_ptr()

If tensor_var1 and tensor_var2 have the same pointer to the storage, then they share the data memory.

I hope it helps.

Make sense! I had some similar ideas of that `id()` behavior before, and your answer made it clear. Thanks a lot! — Đào Minh Dũng, Mar 16 '23 at 01:59

score 1 · Answer 2 · answered May 03 '21 at 18:01

After searching on the Pytorch discuss forum and Stack Overflow, I see that the method data_ptr() should be used in the comparison of locations of tensors (according to the Python discuss in the question and this link) (although it is not totally correct, check the first Python discuss for a better comparison method)

About the id part, there have been many questions on this topic on Stack Overflow. I saw one question here that has many answers which can clear up most part of the question above. I also have some misunderstanding on the id and the memory allocation of objects, which has also been answered in the comment section of my recent question

Pytorch Tensor storages have the same id when calling the storage() method

2 Answers2

Linked