Why does the `is` operator behave differently in a script vs the REPL?

Question

In python, two codes have different results:

a = 300
b = 300
print (a==b)
print (a is b)      ## print True
print ("id(a) = %d, id(b) = %d"%(id(a), id(b))) ## They have same address

But in shell mode(interactive mode):

>>> a = 300
>>> b = 300
>>> a is b
False
>>> id(a)
4501364368
>>> id(b)
4501362224

"is" operator has different results.

This one then: https://stackoverflow.com/questions/13650293/understanding-pythons-is-operator — ChrisGPT was on strike, Mar 25 '19 at 23:04
@Aran-Fey: it's not all that important what type of object is being compared this way, but in this particular case it's because the entity that compiled the `.py` source file noticed that 300 == 300 so made only one 300 instance, while the interpreter that read the `>>>` lines didn't notice that 300 == 300 so made two separate 300 instances. — torek, Mar 25 '19 at 23:05
Still, `is` is for object identity. Only use it when that's what you want. — ChrisGPT was on strike, Mar 25 '19 at 23:05
@Chris I don't think that really answers the question, either. Like torek said, this question is about the difference between running the same code in the REPL vs as a script. — Aran-Fey, Mar 25 '19 at 23:06
I could've sworn I've written an answer about this topic in the past, but I can't find it... Edit: [Found it](https://stackoverflow.com/q/53026920/1222951). Not a perfect duplicate though, so I'll refrain from closing. — Aran-Fey, Mar 25 '19 at 23:10
@OndrejK. That question is a bit different. That behavior is caused by string interning. This one is not. — Aran-Fey, Mar 25 '19 at 23:14
As @Chris says. In other words identity of two immutable objects with the same value is incidental and not guaranteed / required. Why it once matches and once doesn't is hence immaterial. — Ondrej K., Mar 25 '19 at 23:14
Folks, none of those answers actually answers the question. :-( — John Szakmeister, Mar 25 '19 at 23:34
What is actually happening here is that CPython recognizes at compile time in the script that `300` is a literal and immutable, so `a` and `b` can both refer to the same object rather that taking up more constant slots. This doesn't work at the REPL where each line is evaluated individually. But if you were to wrap that code in a function, you'd find that the REPL would give the same behavior calling the function as in the script because the compiler can see both `a` and `b` refer to the same literal. I've demonstrated it in my answer below. — John Szakmeister, Mar 25 '19 at 23:48
[Python 3.6.5 “is” and “==” for integers beyond caching interval](https://stackoverflow.com/questions/53026920/python-3-6-5-is-and-for-integers-beyond-caching-interval) looks like an exact duplicate to me, and its answers *do* appear to me to cover the same ground as the (excellent) answers here. (OTOH, the answers here are perhaps even better, which is why I'm not closing this instance either). — Charles Duffy, Mar 25 '19 at 23:58
@CharlesDuffy Oooh, that is a duplicate. Unfortunately, that wasn't in the list of answers when the question was originally closed. :-( — John Szakmeister, Mar 26 '19 at 00:01
@JohnSzakmeister, perhaps you might move your answer over to the other instance? That would make me feel a lot better about closing this one, and concentrate all the answers in one canonical location. — Charles Duffy, Mar 26 '19 at 12:07
@wim, I was initially hesitant to do that because I think that while this instance has better answers, the other one was more clearly asked. — Charles Duffy, Mar 26 '19 at 16:00

wim · Answer 1 · 2022-07-08T17:36:02.180

When you run code in a .py script, the entire file is compiled into a code object before execution. In this case, CPython is able to make certain optimizations - like reusing the same instance for the integer 300.

You could also reproduce that in the REPL, by executing code in a context more closely resembling the execution of a script:

>>> source = """\ 
... a = 300 
... b = 300 
... print (a==b) 
... print (a is b)## print True 
... print ("id(a) = %d, id(b) = %d"%(id(a), id(b))) ## They have same address 
... """
>>> code_obj = compile(source, filename="myscript.py", mode="exec")
>>> exec(code_obj) 
True
True
id(a) = 140736953597776, id(b) = 140736953597776

Some of these optimizations are pretty aggressive. You could modify the script line b = 300 changing it to b = 150 + 150, and CPython would still "fold" b into the same constant. If you're interested in such implementation details, look in peephole.c and Ctrl+F for PyCode_Optimize and any info about the "consts table".

In contrast, when you run code line-by-line directly in the REPL it executes in a different context. Each line is compiled in "single" mode and this optimization is not available.

>>> scope = {} 
>>> lines = source.splitlines()
>>> for line in lines: 
...     code_obj = compile(line, filename="<I'm in the REPL>", mode="single")
...     exec(code_obj, scope) 
...
True
False
id(a) = 140737087176016, id(b) = 140737087176080
>>> scope['a'], scope['b']
(300, 300)
>>> id(scope['a']), id(scope['b'])
(140737087176016, 140737087176080)

There is no optimization for `b=150+150` in Python3.6, but with later versions. — ead, Mar 26 '19 at 20:56

John Szakmeister · Answer 2 · 2019-03-25T23:50:16.550

There are actually two things to know about CPython and its behavior here. First, small integers in the range of [-5, 256] are interned internally. So any value falling in that range will share the same id, even at the REPL:

>>> a = 100
>>> b = 100
>>> a is b
True

Since 300 > 256, it's not being interned:

>>> a = 300
>>> b = 300
>>> a is b
False

Second, is that in a script, literals are put into a constant section of the compiled code. Python is smart enough to realize that since both a and b refer to the literal 300 and that 300 is an immutable object, it can just go ahead and reference the same constant location. If you tweak your script a bit and write it as:

def foo():
    a = 300
    b = 300
    print(a==b)
    print(a is b)
    print("id(a) = %d, id(b) = %d" % (id(a), id(b)))


import dis
dis.disassemble(foo.__code__)

The beginning part of the output looks like this:

2           0 LOAD_CONST               1 (300)
            2 STORE_FAST               0 (a)

3           4 LOAD_CONST               1 (300)
            6 STORE_FAST               1 (b)

...

As you can see, CPython is loading the a and b using the same constant slot. This means that a and b are now referring to the same object (because they reference the same slot) and that is why a is b is True in the script but not at the REPL.

You can see this behavior in the REPL too, if you wrap your statements in a function:

>>> import dis
>>> def foo():
...   a = 300
...   b = 300
...   print(a==b)
...   print(a is b)
...   print("id(a) = %d, id(b) = %d" % (id(a), id(b)))
...
>>> foo()
True
True
id(a) = 4369383056, id(b) = 4369383056
>>> dis.disassemble(foo.__code__)
  2           0 LOAD_CONST               1 (300)
              2 STORE_FAST               0 (a)

  3           4 LOAD_CONST               1 (300)
              6 STORE_FAST               1 (b)
# snipped...

Bottom line: while CPython makes these optimizations at times, you shouldn't really count on it--it's really an implementation detail, and one that they've changed over time (CPython used to only do this for integers up to 100, for example). If you're comparing numbers, use ==. :-)

This answer leaves a lot to be desired: by "tweaking the script" you're putting the code into a function scope (local vars) instead of a module scope (global vars), which - in terms of execution and name resolution - is now a completely different scoping situation. I think that somewhat invalidates the disassembly. Yes, it happens that the same peephole optimizer is used over the module code in the same way it's used over a function body, but that certainly doesn't have to be the case. And the answer also hasn't really explained *why* the same optimization is not done directly at the REPL. — wim, Mar 26 '19 at 15:51
@wim I certainly delete the answer if you don't think it meets the mark. I personally don't feel the tweaking was negative, but I hear you--it's not *exactly* the same. But I think all of this is pretty tied to what Python does today and isn't a guarantee of what Python may do in the future. And that's a fair criticism--I did not explicitly state why line by line at the REPL is different. — John Szakmeister, Mar 26 '19 at 16:55
Oh, I don't think you should delete it, just update it! I do think the disassembly demonstration is useful content. However, it should `dis` module scoped code, not function scoped code - recent versions of dis allow you to pass in a string of source directly, so it's not necessary to use a function object's `__code__` attribute. — wim, Mar 26 '19 at 17:16
@wim That doesn't seem very fair to you--I think you've done the work on that front already and your answer should be the accepted answer (though I think adding the interning of small integers to the answer is good, just in case people try and get results they don't expect). — John Szakmeister, Mar 27 '19 at 21:09

Why does the `is` operator behave differently in a script vs the REPL?

2 Answers2

Linked

Related