discrepancy between interpreter and script regarding references

Question

I understand what's going on here wrt references:

>>> x = 5
>>> y = x
>>> id(x)
8729216
>>> id(y)
8729216

I also understand that with integers between -5 and 256, the Python interpreter has initialized an integer block ahead of time because of their frequency of use, so I expect the following:

>>> x = 5
>>> y = 5
>>> id(x)
8729216
>>> id(y)
8729216

What I wasn't sure about was what would happen if an integer larger than 256 was created, so I typed some code into the interpreter:

>>> x = 1234567890
>>> y = 1234567890
>>> id(x)
140542533943248
>>> id(y)
140542533943088

OK, the id values are different so two different integer objects were allocated, they just happen to have the same value.

I thought that was it, but then I ran the same bit of code in a script and the id values were the same:

x = 1234567890
y = 1234567890
print(id(x))
print(id(y))

Values printed to the screen:

139663862951888
139663862951888

Huh? Here they are referencing the same integer object. What gives?

mgilson · Answer 1 · 2016-09-08T23:44:56.877

We can disassemble the bytecode to check to see what is going on here:

def func():
    x = 1234567890
    y = 1234567890
    print(id(x))
    print(id(y))

import dis
dis.dis(func)

The output is:

  3           0 LOAD_CONST               1 (1234567890)
              3 STORE_FAST               0 (x)

  4           6 LOAD_CONST               1 (1234567890)
              9 STORE_FAST               1 (y)

  5          12 LOAD_GLOBAL              0 (id)
             15 LOAD_FAST                0 (x)
             18 CALL_FUNCTION            1
             21 PRINT_ITEM          
             22 PRINT_NEWLINE       

  6          23 LOAD_GLOBAL              0 (id)
             26 LOAD_FAST                1 (y)
             29 CALL_FUNCTION            1
             32 PRINT_ITEM          
             33 PRINT_NEWLINE       
             34 LOAD_CONST               0 (None)
             37 RETURN_VALUE

Now we see that the entities in question are fetched using the LOAD_CONST op code.

The documentation isn't terribly easy to understand what is going on here, but basically, the peephole optimizer has seen this constant before in this block, so it will just return the same constant back. Obviously this only works with immutable literals -- And it's a CPython specific optimization.

Also note that constants can be specific to a single code object (which is a function property):

def test1():
    return 1234567890

def test2():
    return 1234567890

a = test1()
b = test2()

print(a is b)  # False

def test3():
    a = 1234567890
    b = 1234567890
    return a, b

t = test3()
print(t[0] is t[1])  # True

The above results were generated using python2.7.10 and python3.6.0a2. If you run it with pypy, you'll get True printed twice.

Excellent explanation. I didn't know about the disassembler, so thanks for that. That might come in useful in the future. — MacGruber, Sep 09 '16 at 00:53

Craig Burgler · Answer 2 · 2016-09-09T00:59:02.817

1

The following interpreter examples support @mgilson's insights that the behavior described in the OP is a product of CPython's optimization of code objects:

tuple assignment (single code object: reference optimization)

>>> x, y  = 1234567890, 1234567890
>>> x is y
True

compound statement (single code object: reference optimization)

>>> x = 1234567890; y = 1234567890
>>> x is y
True

multiple statements (multiple code objects: no reference optimization)

>>> x = 1234567890
>>> y = 1234567890
>>> x is y
False

edited Sep 09 '16 at 00:59

answered Sep 08 '16 at 23:54

Craig Burgler

1,749
10
19

Nice! Thanks for the further clarifying this. – MacGruber Sep 09 '16 at 00:58

discrepancy between interpreter and script regarding references

2 Answers2