3

"is" built-in operator shows a strange behavior for the element in np.ndarray.

Although the id of the rhs and the lhs is the same, the "is" operator returns False (this behavior is specific to np.ndarray).

a = np.array([1.,])
b = a.view()
print(id(a[0] == id(b[0])))  # True
print(a[0] is b[0])  # False

This strange behavior even happens without the copy of view.

a = np.array([1.,])
print(a[0] is a[0])  # False

Does anyone know the mechanism of this strange behavior (and possibly the evidence or specification)?

Post Script: Please re-think the two examples.

  1. If this is a list, this phenomenon is not observed.
a = [0., 1., 2.,]
b = []
b.append(a[0])
print(a[0] is b[0])  # True
  1. a[0] and b[0] refer the exact same object.
a = np.array([1.,])
b = a.view()
b[0] = 0.
print(a[0])  # 0.0
print(id(a[0]) == id(b[0]))  # True

Note: This question can be a duplication, but I'm still a bit confused.

a = np.array([1.,])
b = a.view()
x = a[0]
y = b[0]
print(id(a[0]))  # 139746064667728
print(id(b[0]))  # 139746064667728
print(id(a[0]) == id(b[0])) # True
print(id(a[0]) == id(x)) # False
print(id(x) == id(y))  # False
  1. Is a[0] a temporal object?
  2. Is the id for a temporal object reused?
  3. Doesn't it contradict to the specification? (https://docs.python.org/3.7/reference/expressions.html#is)
6.10.3. Identity comparisons
The operators is and is not test for object identity: x is y is true if and only if x and y are the same object. Object identity is determined using the id() function. x is not y yields the inverse truth value.
  1. If the id is re-used for the temporal objects, why in this case the id is different?
>>> id(100000000000000000 + 1) == id(100000000000000001)
True
>>> id(100000000000000000 + 1) == id(100000000000000000)
False
Yuki Hashimoto
  • 1,013
  • 7
  • 19
  • 1
    Possible duplicate of [id() vs \`is\` operator. Is it safe to compare \`id\`s? Does the same \`id\` mean the same object?](https://stackoverflow.com/questions/52268343/id-vs-is-operator-is-it-safe-to-compare-ids-does-the-same-id-mean-the) – ivan_pozdeev Feb 01 '19 at 08:33
  • No, this is not a duplicated question. In that case, the temporary object for which we cannot estimate its lifetime was discussed. However, in this case, the objective is a memory buffer which can also be confirmed even by memory-checking tools. – Yuki Hashimoto Feb 01 '19 at 08:46
  • `a[0]` does not give direct access to the databuffer of `a`. – hpaulj Feb 01 '19 at 09:32
  • 1
    Re: edit: as per the duplicate, the result of `id(foo()) == id(bar())` is undefined behavior, and for `int` specifically, there's another factor at play. – ivan_pozdeev Feb 01 '19 at 09:43

5 Answers5

3

This is simply due to the difference in how the is and == works , the is operator doesn't compare the values they simply check if the two operands refer to the same object or not.

For example if you do:

print(a is a)

The output will be: True for more information look up here

When python compares it allocates different positions to the operands and the same behaviour can be observed with a simple test using an id function.

print(id(a[0]),a[0] is a[0],id(a[0]))

The output will be:

140296834593128 False 140296834593248

The answer to the question that you are asking in addition that why lists don't behave the way numpy arrays behave is simply based on their construction. Np.arrays were designed to be more efficient in their processing capabilities and more efficient in their storage than a normal python list.

So every-time you load or perform an operation on a numpy array it is loaded and assigned a different id as you can observe from the following code:

a = np.array([0., 1., 2.,])
b = []
b.append(a[0])
print(id(a[0]),a[0] is b[0],id(b[0]))

Here are the outputs of multiple re-runs of the same code in jupyter-lab:

140296834595096 False 140296834594496
140296834595120 False 140296834594496
140296834595120 False 140296834594496
140296834595216 False 140296834594496
140296834595288 False 140296834594496

Notice something strange?, The ids of the numpy array with each re-run is different however the id for the list object remains the same. This explains the strange behaviour for numpy arrays in your question.

If you want to read more on this behaviour I will suggest numpy docs

Inder
  • 3,711
  • 9
  • 27
  • 42
  • No, I do not refer to that difference. Remember, Python's "is" (usually) returns True when the lhs and rhs's ids are the same. For example, a = [0, 1, 2]; b = []; b.append(a[0]); print(a[0] is b[0]) # True. This is only specific to numpy. – Yuki Hashimoto Feb 01 '19 at 08:26
  • @YukiHashimoto if the id's are same then it should be true, please elaborate whaat you are saying a little more – Inder Feb 01 '19 at 08:28
  • Thanks again. Please try to reproduce the code in my question. The ids of a[0] and "another" a[0] is the same, but the is operator returns False. – Yuki Hashimoto Feb 01 '19 at 08:34
  • @YukiHashimoto that is because numpy arrays and lists are quite different from each other on how they are called, I will edit my answer in a minute to explain. – Inder Feb 01 '19 at 08:42
  • I finally totally understood what is happening. However, I'm not certain yet that doesn't this contradict to the documentation of Python? – Yuki Hashimoto Feb 01 '19 at 09:08
  • @YukiHashimoto can you please point out to the specific part that you think is contradicted – Inder Feb 01 '19 at 09:18
  • The documentation says "Object identity is determined using the id() function.", but even if the ids are accidentally the same, `is` returns False, doesn't it? – Yuki Hashimoto Feb 01 '19 at 09:26
  • 1
    i doubt if it is possible for two objects with different lifetimes to have the same id's, id in a very crude way of saying is like the memory number, two things can't be at the same place at a given time. – Inder Feb 01 '19 at 09:53
1

a[0] is of type <class 'numpy.float64'>. When you do the comparison it crates two instances of the class, so the is check fails. However if you do the following you will get what you wanted, because now both are referencing the same object.

x = a[0]
print(x is x)  # True
xashru
  • 3,400
  • 2
  • 17
  • 30
  • Thanks for the comment. However, in the first case, a[0] and b[0] is not different object. For example, a = np.array([1.,]); b = a.view(); b[0] = 0; print(a[0]) # 0. – Yuki Hashimoto Feb 01 '19 at 08:27
  • No, `a[0]` and `b[0]` are still different numpy objects. They reference, or unbox, the same value in `a`. For a list `alist[0]` is the actual object in the list, so id's match. But the 'contents' of array `a` is a flat data buffer. `a[0]` is not an item in the buffer. – hpaulj Feb 01 '19 at 17:51
1

This is covered by id() vs `is` operator. Is it safe to compare `id`s? Does the same `id` mean the same object? . In this particular case:

  1. a[0] and b[0] are created anew each time

    In [7]: a[0] is a[0]
    Out[7]: False
    
  2. In id(a[0]) == id(b[0]), each object is immediately discarded after taking its id, and b[0] just happened to take up the id of the recently-discarded a[0]. Even if this happens each time in your version of CPython for this particular expression (due to a specific evaluation order and heap organization), this is an implementation detail and you can't rely on it.

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
0

Numpy stores array data as a raw data buffer. When you access the data like a[0], it reads from the buffer and constructs a python object for it. Thus, calling a[0] twice will construct 2 python objects. is checks for identity, so 2 different objects will compare false.

This illustration should make the process much clearer:

NOTE: id numbers are sequential to be used simply as examples. clearly you'd get a random like number. The multiple id 3s in the example also may not necessarily always be the same number. It's just possible that they are, because id 3 is repeatedly freed and thus reusable.

a = np.array([1.,])
b = a.view()
x = a[0]    # python reads a[0], creates new object id 1.
y = b[0]    # python reads b[0] which reads a[0], creates new object id 2. (1 is used by object x)

print(id(a[0]))  # python reads a[0], creates new object id 3.
                 # After this call, the object id 3 a[0] is no longer used.
                 # Its lifetime has ended and id 3 is freed.

print(id(b[0]))  # python reads b[0] which reads a[0], creates new object id 3. 
                 # id 3 has been freed and is reusable.
                 # After this call, the object id 3 b[0] is no longer used.
                 # Its lifetime has ended and id 3 is freed (again).

print(id(a[0]) == id(b[0])) # This runs in 2 steps.
                            # First id(a[0]) is run. This is just like above, creates a object with id 3.
                            # Then a[0] is disposed of since no references are created to it. id 3 is freed again.
                            # Then id(b[0]) is run. Again, it creates a object with id 3. (Since id 3 is free).
                            # So, id(a[0]) == 3, id(b[0]) == 3. They are equal.

print(id(a[0]) == id(x)) # Following the same thing above, id(a[0]) can create a object of id 3, x maintains its reference to id 1 object. 3 != 1.

print(id(x) == id(y))  # x references id 1 object, y references id 2 object. 1 != 2

Regarding

>>> id(100000000000000000 + 1) == id(100000000000000001)
True
>>> id(100000000000000000 + 1) == id(100000000000000000)
False

id allocation, and garbage collection are implementation details. What is guaranteed, is that, at a single point in time, references to 2 different objects are different and references to 2 identical objects are the same. The problem is that some expressions may not be atomic (i.e. not run at a single point in time).

Python may decide to reuse or not to reuse freed id numbers as it wishes, depending on the implementation. In this case, it decided to reuse in one case and not in the other. (it's likely that in the id(100000000000000000 + 1) == id(100000000000000001) python realises that since the number is the same, it can reuse it efficiently because 100000000000000001 would be in the same location in memory.)

Eric
  • 5,686
  • 2
  • 23
  • 36
  • OK, but why their id is the same? – Yuki Hashimoto Feb 01 '19 at 08:32
  • I'm not entirely sure of that (in the sense of I don't know how to confirm it), but I'd expect it's the same case as the linked question by ivan. The lifetime of `a[0]` ended after the first `id()` was called, then `b[0]` is able to reuse `a[0]` id. – Eric Feb 01 '19 at 08:39
  • I don't have access to numpy right now, what if you store `x=a[0]` and `y=b[0]` as variables? In this case, the lifetime of `x`/`a[0]` would not have ended yet, so the `id()` should be different. – Eric Feb 01 '19 at 08:40
  • "Partially yes and partially no" was the fact. Even when x = a[0]; y = b[0], the ids of a[0] and b[0] were the same, although the ids of x and y were different from them. – Yuki Hashimoto Feb 01 '19 at 09:02
  • ? you mean you can `id(a[0]) == id(b[0])` AND `x=a[0]`,`y=b[0]`,`id(x) == id(y)`? `id(a[0]) == id(b[0])` would not make any difference. Again, a[0] in `id(a[0]) == id(b[0])` is a different object from a[0] in `x=a[0]` – Eric Feb 01 '19 at 09:41
  • I edited my answer to illustrate step by step, I believe it should clear up any confusion? – Eric Feb 01 '19 at 09:53
0

A big part of the confusion here is the nature of a[0] in the case of an array.

For a list, b[0] is an actual element of b. We can illustrate this by making a list of mutable items (other lists):

In [22]: b = [[0],[1],[2],[3]]
In [23]: b1 = b[0]
In [24]: b1
Out[24]: [0]
In [25]: b[0].append(10)
In [26]: b
Out[26]: [[0, 10], [1], [2], [3]]
In [27]: b1
Out[27]: [0, 10]
In [28]: b1.append(20)
In [29]: b
Out[29]: [[0, 10, 20], [1], [2], [3]]

Mutating b[0] and b1 act on the same object.

For an array:

In [35]: a = np.array([0,1,2,3])
In [36]: c = a.view()
In [37]: a1 = a[0]
In [38]: a += 1
In [39]: a
Out[39]: array([1, 2, 3, 4])
In [40]: c
Out[40]: array([1, 2, 3, 4])
In [41]: a1
Out[41]: 0

an inplace change in a does not change a1, even though it did change c.

__array_interface__ shows us where the databuffer for an array is stored - think of it, in a loose sense, as the memory address of that buffer.

In [42]: a.__array_interface__['data']
Out[42]: (31233216, False)
In [43]: c.__array_interface__['data']
Out[43]: (31233216, False)
In [44]: a1.__array_interface__['data']
Out[44]: (28513712, False)

The view has the same databuffer. But a1 does not. a[0:1] is a single element view of a, and does share the data buffer.

In [45]: a[0:1].__array_interface__['data']
Out[45]: (31233216, False)
In [46]: a[1:2].__array_interface__['data']  # 8 bytes over
Out[46]: (31233224, False)

So id(a[0]) tells us next to nothing about a. Comparing ids only tells us something about how memory slots are recycled, or not, when constructing Python objects.

hpaulj
  • 221,503
  • 14
  • 230
  • 353