1

As I know, using "=" for copying objects actually just creates another reference to the same object. So if I do

a = [1, 2, 3]
b = a
print(id(a), id(b), a is b)

my output is 2367729946880 2367729946880 True, which is fine and obvious.

If I make copies of list, they has different ids:

a = [1, 2, 3]
b = a.copy()
c = a.copy()
print(id(b), id(c), b is c)

Output: 2646790648192 2646790705984 False.

So far so good. Though, if I try creating copies directly in the print, they unexpectedly has the same id:

a = [1, 2, 3]
print(id(a.copy()), id(a.copy()))

Output: 2209221063040 2209221063040

How does it happen?

I tried a bunch of different stuff, like:

  • assigning copies to variables in the same line in case there is some one-line optimization as Python is an interpreted language
a = [1, 2, 3]
b, c = a.copy(), a.copy()
print(id(b), id(c))

Output: 2545996280192 2545996337984

  • passing copies to the function to avoid using "="
def f(a, b):
    print(id(a), id(b))

c = [1, 2, 3]
f(c.copy(), c.copy())

Output: 1518673867136 1518673852736

  • passing copies to the function, using *args because as I know, print() gets arguments same way:
def f(*args):
    print(id(args[0]), id(args[1]))

c = [1, 2, 3]
f(c.copy(), c.copy())

Output: 1764444352896 1764444338496 (difference in 3rd least valuable digit)

None seem to produce same behaviour. Even comparing ids using operator "is" prints False:

a = [1, 2, 3]
print(a.copy() is a.copy())

Output: False

But using "==" still gives True:

a = [1, 2, 3]
print(id(a.copy()) == id(a.copy()))

Output: True


Summing up all the text, I wonder about:

  1. What provokes this kind of behaviour? It doesn't seem intended. Is it result of some optimization?
  2. Can it potentially lead to some nasty unexpected bugs? Is there another way to get two copies to have same id?
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
botanich
  • 39
  • 5
  • This is unrelated to your question, but the pendant in me wants to point out that "Python is an interpreted language" isn't true—Python is compiled to bytecode. – SuperStormer Mar 09 '23 at 20:44
  • 2
    Even more pedantic, no language is interpreted or compiled; a language just *is*. Any language can be used as the input to an interpreter or compiler. In the case of CPython (the reference implementation of Python), both approaches are used: the Python source is first compiled to bytecode, then the byte code is interpreted by a virtual machine. Other Python implementations can do something different. (Jython, for example, compiled Python source to Java byte code.) – chepner Mar 09 '23 at 20:50

1 Answers1

5

id returns an integer that is unique for the lifetime of the object. Here, that id got re-used, b ecause the lifetime of the objects did not overlap. In the expression:

print(id(a.copy()), id(a.copy()))

First, a.copy() is evaluated, it creates a new dict, that dict gets passed to id, id returns an integer, the dict is no longer referenced, and immediately reclaimed (this is a Cpython implementation detail). Then, a.copy() is evaluated again, and again, it is passed to id. It returns the same int because that is perfectly in-line with the documented function of id. You can look at the dissasembled bytecode and see how this works exactly:

>>> import dis
>>> dis.dis("print(id(a.copy()), id(a.copy()))")
  0           0 RESUME                   0

  1           2 PUSH_NULL
              4 LOAD_NAME                0 (print)
              6 PUSH_NULL
              8 LOAD_NAME                1 (id)
             10 LOAD_NAME                2 (a)
             12 LOAD_METHOD              3 (copy)
             34 PRECALL                  0
             38 CALL                     0
             48 PRECALL                  1
             52 CALL                     1
             62 PUSH_NULL
             64 LOAD_NAME                1 (id)
             66 LOAD_NAME                2 (a)
             68 LOAD_METHOD              3 (copy)
             90 PRECALL                  0
             94 CALL                     0
            104 PRECALL                  1
            108 CALL                     1
            118 PRECALL                  2
            122 CALL                     2
            132 RETURN_VALUE

Of course, you don't get any information about when and where an object is garbage collected.

Another way to see similar behavior:

>>> for _ in range(3):
...     print(id([]))
...
4370212416
4370212416
4370212416

Those were all distinct list objects, but they were able to re-use the id because they have non-overlapping lifetimes.

So, this is all working as intended and documented.

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172