Why is the id comparison of items from two different numpy arrays True?

Question

import numpy as np
array_1 = np.array([1,2,3])
array_2 = np.array([4,5,6])

print(id(array_1))
print(id(array_2))
print(f"array id comparison = {id(array_1)==id(array_2)}")

print(id(array_1[0]))
print(id(array_2[0]))
print(f"array item  id comparison = {id(array_1[0])==id(array_2[0])}")

Output:

553196994064
553197145904
array id comparison = False
553211404432
553211405200
array item  id comparison = True*

The ids of array items are different, but why is the comparison of ids of array items True?

is this useful? https://stackoverflow.com/questions/49855586/numpy-array-memory-id — frab, Mar 14 '22 at 18:47
Saving the `int` used to represent the ID doesn't prevent another object from using the same ID. — chepner, Mar 14 '22 at 18:55
Using `id` on a lists can be useful, since lists store values by reference. They are virtually useless when working with `numpy` arrays. — hpaulj, Mar 14 '22 at 18:55
Assigning the value *itself* (`a1 = array_1[0]; a2 = array_2[0]`) will cause `id(a1) == id(a2)` to be false, though, since the lifetimes of both objects now overlap. — chepner, Mar 14 '22 at 18:56
This is just an accident of implementation and won't always be true. `array_1[0]` is freed and it just so happens its usually now available for the first new assignment of `array_2[0]`. But that's just the memory manager. Suppose for instance that another thread interrupts processing between these two operations - something else will get that memory location. — tdelaney, Mar 14 '22 at 19:07

timgeb · Answer 1 · 2022-03-14T20:46:29.847

2

A new object (of type numpy.int32) is created each time you access the array elements directly:

>>> id(array_1[0])
3045147529104
>>> id(array_1[0])
3045147529008
>>> id(array_1[0])
3045147527984

When you issue

id(array_1[0]) == id(array_2[0])

after getting the first id, the first object is deleted because there are no references pointing to it, making it unreachable. The second object is then created at the same memory address (and therefore with the same id). Both of this happens before the equality operator comes into play.

When you assign a name to the first object you can prevent its deletion, because now it has a reference pointing to it.

>>> id(array_1[0]) == id(array_2[0])
True
>>> id(x := array_1[0]) == id(array_2[0])
False

edited Mar 14 '22 at 20:46

answered Mar 14 '22 at 18:52

timgeb

76,762
20
123
145

1

`array_1[0]` doesn't return an `int`; it returns a `numpy.int64` value, which isn't subject to interpreter-level interning. – chepner Mar 14 '22 at 18:54
1

Minor issue, but that isn't garbage collection. CPython deletes objects when their reference counts reach zero. That's what happens here. Garbage collection only happens periodically and looks for circular references. – tdelaney Mar 14 '22 at 19:01

score 1 · Answer 2 · answered Mar 14 '22 at 18:51

1

array_1[0] and array_2[0] both create new objects; they do not simply return references to existing Python objects. As such, the object returned by array_1[0] can be garbage collected as soon as id(array_1[0]) returns, which means the ID previously allocated to that object is free to be reused by the object returned by array_2[0].

answered Mar 14 '22 at 18:51

chepner

497,756
71
530
681

1

Minor, but its not garbage collection which only happens periodically and only affects circular references. This is just vanilla reference counting. – tdelaney Mar 14 '22 at 19:03
It's all garbage collection. Reference counting is the primary algorithm used by CPython to collect objects as soon as possible. The Garbage Collector is the secondary algorithm used to handle reference cycles. See https://devguide.python.org/garbage_collector/#design-of-cpython-s-garbage-collector. – chepner Mar 14 '22 at 19:16
1

The python documentation is careful not to mix the two. In the [gc docs](https://docs.python.org/3/library/gc.html) for example, it says _the collector supplements the reference counting already used in Python, you can disable the collector if you are sure your program does not create reference cycles._ Python implements circular garbage collection, not general garbage collection. Its a shame that a core developer would confuse the issue, but there it is. – tdelaney Mar 14 '22 at 19:41
If anything, the gc docs just failed to differentiate between garbage-collection-the-technique and Garbage-Collector-the-specific-algorithm. "Python" doesn't implement anything; CPython uses a combination of reference counting and whatever Garbage Collector is, while Jython just relied on the underlying JVM to handle garbage collection. – chepner Mar 14 '22 at 19:44

Why is the id comparison of items from two different numpy arrays True?

2 Answers2