11

Two python objects have the same id but "is" operation returns false as shown below:

a = np.arange(12).reshape(2, -1)
c = a.reshape(12, 1)
print("id(c.data)", id(c.data))
print("id(a.data)", id(a.data))

print(c.data is a.data)
print(id(c.data) == id(a.data))

Here is the actual output:

id(c.data) 241233112
id(a.data) 241233112
False
True

My question is... why "c.data is a.data" returns false even though they point to the same ID, thus pointing to the same object? I thought that they point to the same object if they have same ID or am I wrong? Thank you!

user2357112
  • 260,549
  • 28
  • 431
  • 505
drminix
  • 131
  • 6
  • Even though the `id` is the same, the memory addresses are different, try `print(c.data)` and `print(a.data)` – C.Nivs Apr 12 '19 at 19:26
  • This is definitely a duplicate, but I can't find any... – Aran-Fey Apr 12 '19 at 19:29
  • 1
    @C.Nivs They don't even necessarily have different memory addresses (something which Python doesn't expose). Whatever memory was used for the first may have been reused for the second. – chepner Apr 12 '19 at 19:29
  • @chepner you're right, on subsequent calls the memory addresses are re-used – C.Nivs Apr 12 '19 at 19:31
  • 3
    @C.Nivs Don't think of it in terms of memory addresses. How memory is managed is *completely* implementation dependent. All you know for sure is that two objects that overlap in time will not have the same id. – chepner Apr 12 '19 at 19:32
  • 1
    @Aran-Fey, that's okay a good question(though asked before) can sometimes be resurrected for a fruitful discussion – amanb Apr 12 '19 at 19:35
  • @chepner that clears that up immensely. So if I were to store an object as a variable in memory, that `id` is taken and won't be assigned to any other variable unless I pass a reference from that object to the other (if I can do that) – C.Nivs Apr 12 '19 at 19:35
  • 4
    @C.Nivs no, *ids do not belong to variables*. They belong to *objects*. Many variables can reference the same object. – juanpa.arrivillaga Apr 12 '19 at 19:55
  • @juanpa.arrivillaga I think my terminology is off, that's what I meant by the `id` won't be re-used unless the reference is shared between two variables, in which case I think "re-used" is the wrong term here. thanks for the clarification – C.Nivs Apr 12 '19 at 19:59
  • 1
    References aren't shared between two variables; each variable *is* a reference to the object it refers to. – chepner Apr 12 '19 at 20:07
  • 1
    @C.Nivs yes, you need to understand, Python variables are not like C variables. The best way is to think of a Python variable as *literally a key in a dict*. This is actually true for the global scope, but modern python optimizes local scopes as essentially arrays/symbol tables. In any case, just as you can have 100 references to the same *object* in a dict, a variable can reference the same object, or a different object, and the ID belongs to the object, not the variable. – juanpa.arrivillaga Apr 12 '19 at 20:17
  • @juanpa.arrivillaga As I'm not a C programmer, I don't think I was conflating the two, just trying to get my understanding straight here. Again, I think semantically I'm just not using the terminology right, I still have a ton to learn in this space. The reason I said *sharing a reference* is in regard to referencing a mutable object like if I assigned `a = some_list; b = a`, `b` and `a` are different *variables* but they reference the same object – C.Nivs Apr 12 '19 at 20:34
  • 1
    @C.Nivs sure, but note, mutable/immutable is irrelevant. *The semantics of python assignment are exactly the same regardless of type*. `a = some_tuple; b = a` both `a` and `b` refer to the same object, which happens to be immutable. – juanpa.arrivillaga Apr 12 '19 at 21:18
  • 1
    @C.Nivs Perhaps you should read Ned Batchelder's [Facts and myths about Python names and values](https://nedbatchelder.com/text/names.html). – PM 2Ring Apr 13 '19 at 00:35
  • This is a duplicate of [id() vs `is` operator. Is it safe to compare `id`s? Does the same `id` mean the same object?](https://stackoverflow.com/questions/52268343/id-vs-is-operator-is-it-safe-to-compare-ids-does-the-same-id-mean-the), but as per my long-running dialog with @ivan-pozdeev over there, that question needs tons more real-world examples in Python where objects (/attributes/views) we might expect are the same or different, are actually the opposite, or implementation-dependent. – smci Apr 13 '19 at 02:00
  • **What were you intending to test with `is` operator on the `.data` attribute?** `a is c` gives False, telling you the *arrays* themselves are different (different dimensions, in this particular case). But they can be built on the same underlying data, hence `a.data` view gives the same as `c.data`. Are you just poking at numpy's memory internals, or were you trying to test some programming property, if so what? (Don't use `is` to test numerical equality, for example) – smci Apr 13 '19 at 02:06
  • @PM2Ring Just finished reading, thanks for the link, it really put the things that juanpa.arrivillaga and chepner were explaining into perspective – C.Nivs Apr 13 '19 at 17:20

2 Answers2

17

a.data and c.data both produce a transient object, with no reference to it. As such, both are immediately garbage-collected. The same id can be used for both.

In your first if statement, the objects have to co-exist while is checks if they are identical, which they are not.

In the second if statement, each object is released as soon as id returns its id.

If you save references to both objects, keeping them alive, you can see they are not the same object.

r0 = a.data
r1 = c.data
assert r0 is not r1
chepner
  • 497,756
  • 71
  • 530
  • 681
  • 5
    what is confusing is the fact that `data` looks like an attribute, but is probably a property – Jean-François Fabre Apr 12 '19 at 19:27
  • In my tests, the id's are different in the first run, but change to become the __same__ on subsequent runs. – amanb Apr 12 '19 at 19:30
  • @Jean-FrançoisFabre so would that mean that the object itself is only returned when a getter is called, and the property is not actually stored in the class? I'm not quite familiar with the differences between a property vs attribute – C.Nivs Apr 12 '19 at 19:30
  • 6
    a property is a method disguised as an attribute. So it can return a discardable integer, object, whatever. – Jean-François Fabre Apr 12 '19 at 19:31
  • Thank you all! Coming from C/C++, I was just looking for a way to check if two different pointers point to the same object. So I should use "is operator" to compare if check if two pointers point to the same object. id() can return the same string since it can be re-used for transient objects. Thanks – drminix Apr 13 '19 at 06:12
  • Say "names" rather than pointers; it's best to understand how variables work in Python rather than try to apply any understanding you bring from another language. See https://nedbatchelder.com/text/names.html for a good overview. `is` (and `id`) gets used far less often than you might expect. – chepner Apr 13 '19 at 12:20
6
In [62]: a = np.arange(12).reshape(2,-1) 
    ...: c = a.reshape(12,1)                                                    

.data returns a memoryview object. id just gives the id of that object; it's not the value of the object, or any indication of where a databuffer is located.

In [63]: a.data                                                                 
Out[63]: <memory at 0x7f672d1101f8>
In [64]: c.data                                                                 
Out[64]: <memory at 0x7f672d1103a8>
In [65]: type(a.data)                                                           
Out[65]: memoryview

https://docs.python.org/3/library/stdtypes.html#memoryview

If you want to verify that a and c share a data buffer, I find the __array_interface__ to be a better tool.

In [66]: a.__array_interface__['data']                                          
Out[66]: (50988640, False)
In [67]: c.__array_interface__['data']                                          
Out[67]: (50988640, False)

It even shows the offset produced by slicing - here 24 bytes, 3*8

In [68]: c[3:].__array_interface__['data']                                      
Out[68]: (50988664, False)

I haven't seen much use of a.data. It can be used as the buffer object when creating a new array with ndarray:

In [70]: d = np.ndarray((2,6), dtype=a.dtype, buffer=a.data)                    
In [71]: d                                                                      
Out[71]: 
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11]])
In [72]: d.__array_interface__['data']                                          
Out[72]: (50988640, False)

But normally we create new arrays with shared memory with slicing or np.array (copy=False).

hpaulj
  • 221,503
  • 14
  • 230
  • 353