Uncommon behaviour of IS operator in python

Question

From some of the answers on Stackoverflow, I came to know that from -5 to 256 same memory location is referenced thus we get true for:

>>> a = 256
>>> a is 256
True

Now comes the twist (see this line before marking duplicate):

>>> a = 257
>>> a is 257 
False

This is completely understood, but now if I do:

>>> a = 257; a is 257
True
>>> a = 12345; a is 12345
True

Why?

I bet because the interpreter can "see" and analyze _both_ statements at the same time, Python is able to do some optimizations to eliminate the variables altogether and do stuff like `12345 is 12345`. — ForceBru, Jun 15 '18 at 19:40
...or not: `import dis; dis.disco(compile('a=P;a is P', '', 'exec'))` returns similar assembly for both 256 and 12345... — ForceBru, Jun 15 '18 at 19:52
This question is epic. we have hit really good conclusions. Thanks for asking it! — Attersson, Jun 15 '18 at 20:22
I am new to Stackoverflow and I am really thankful to this community. — Ritesh, Jun 15 '18 at 20:30

Blckknght · Accepted Answer · 2018-06-15T21:54:45.857

What you're seeing is an optimization in the compiler in CPython (which compiles your source code into the bytecode that the interpreter runs). Whenever the same immutable constant value is used in several different places within the a chunk of code that is being compiled in one step, the compiler will try to use a reference to same object for each place.

So if you do multiple assignments on the same line in an interactive session, you'll get two references to the same object, but you won't if you use two separate lines:

>>> x = 257; y = 257  # multiple statements on the same line are compiled in one step
>>> print(x is y)     # prints True
>>> x = 257
>>> y = 257
>>> print(x is y)     # prints False this time, since the assignments were compiled separately

Another place this optimization comes up is in the body of a function. The whole function body will be compiled together, so any constants used anywhere in the function can be combined, even if they're on separate lines:

def foo():
    x = 257
    y = 257
    return x is y  # this will always return True

While it's interesting to investigate optimizations like this one, you should never rely upon this behavior in your normal code. Different Python interpreters, and even different versions of CPython may do these optimizations differently or not at all. If your code depends on a specific optimization, it may be completely broken for somebody else who tries to run it on their own system.

As an example, the two assignments on the same line I show in my first code block above doesn't result in two references to the same object when I do it in the interactive shell inside Spyder (my preferred IDE). I have no idea why that specific situation doesn't work the same way it does in a conventional interactive shell, but the different behavior is my fault, since my code relies upon implementation-specific behavior.

Attersson · Answer 2 · 2018-06-15T20:55:05.890

2

After discussion and testing in various versions, the final conclusions can be drawn.

Python will interpret and compile instructions in blocks. Depending on the syntax used, Python version, Operating System, distribution, different results may be achieved depending on what instructions Python takes in one block.

The general rules are:

(from official documentation)

The current implementation keeps an array of integer objects for all integers between -5 and 256

Therefore:

a = 256
id(a)
Out[2]: 1997190544
id(256)
Out[3]: 1997190544 # int actually stored once within Python

a = 257
id(a)
Out[5]: 2365489141456
id(257)
Out[6]: 2365489140880 #literal, temporary. as you see the ids differ
id(257)
Out[7]: 2365489142192 # literal, temporary. as you see it gets a new id everytime
                      # since it is not pre-stored

The part below returns False in Python 3.6.3 |Anaconda custom (64-bit)| (default, Oct 17 2017, 23:26:12) [MSC v.1900 64 bit (AMD64)]

a = 257; a is 257
Out[8]: False

But

a=257; print(a is 257) ; a=258; print(a is 257)
>>>True
>>>False

As it is evident, whatever Python takes in "one block" is non deterministic and can be swayed depending on how it is written, single line or not, as well as the version, operating system and distribution used.

edited Jun 15 '18 at 20:55

answered Jun 15 '18 at 19:44

Attersson

4,755
1
15
29

I get True for `a = x; a is x` for any x on 3.6, any idea what the difference is between that and splitting it on two lines? – bphi Jun 15 '18 at 19:49
Can't confirm your last result with Python 3.6.1 – ForceBru Jun 15 '18 at 19:49
I am using Python 3.6.3 and seriously, can you install the latest version and verify please? We might have come across a difference in implementation – Attersson Jun 15 '18 at 19:51
Using version 3.6.4 – bphi Jun 15 '18 at 19:57
Python 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bi t (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. '>>> a=257;a is 257' 'True' – Ritesh Jun 15 '18 at 19:58
2

Thanks. It means that in the newest version, Python adopts a by-line buffering to avoid redeclaring identical symbols and make duplicates in memory – Attersson Jun 15 '18 at 19:59
Can you test `a=257; print(a is 257) ; a=258; print(a is 257)` ? – Attersson Jun 15 '18 at 20:06
it gives True \n False – Ritesh Jun 15 '18 at 20:09
3

It's not actually line by line. It's whatever gets compiled in one go. A function's body, for instance, will be usually be compiled all in one step, so `def foo():x=1000;y=1000;return x is y` (with newlines and appropriate indentation) will use the same constant integer object for both `x` and `y`, and so return `True`. – Blckknght Jun 15 '18 at 20:09
Yes, this definitely proves it. – Attersson Jun 15 '18 at 20:09
@Blckknght why don't you write it as an answer. I needed something like this thats easy to understand :) . Thanks – Ritesh Jun 15 '18 at 20:14

score 2 · Answer 3 · answered Jun 15 '18 at 19:52

Generally speaking, numbers outside the range -5 to 256 will not necessarily have the optimization applied to numbers within that range. However, Python is free to apply other optimizations as appropriate. In your cause, you're seeing that the same literal value used multiple times on one line is stored in a single memory location no matter how many times it's used on that line. Here are some other examples of this behavior:

>>> s = 'a'; s is 'a'
True
>>> s = 'asdfghjklzxcvbnmsdhasjkdhskdja'; s is 'asdfghjklzxcvbnmsdhasjkdhskdja'
True
>>> x = 3.14159; x is 3.14159
True
>>> t = 'a' + 'b'; t is 'a' + 'b'
True
>>>

BPL · Answer 4 · 2018-06-15T20:16:26.970

From python2 docs:

The operators is and is not test for object identity: x is y is true if and only if x and y are the same object. x is not y yields the inverse truth value. [6]

From python3 docs:

The operators is and is not test for object identity: x is y is true if and only if x and y are the same object. Object identity is determined using the id() function. x is not y yields the inverse truth value. [4]

So basically the key to understand those tests you've run on the repl console is by using accordingly the id() function, here's an example that will show you what's going on behind the curtains:

>>> a=256
>>> id(a);id(256);a is 256
2012996640
2012996640
True
>>> a=257
>>> id(a);id(257);a is 257
36163472
36162032
False
>>> a=257;id(a);id(257);a is 257
36162496
36162496
True
>>> a=12345;id(a);id(12345);a is 12345
36162240
36162240
True

That said, usually a good way to understand what's going on behind the curtains with these type of snippets is by using either dis.dis or dis.disco, let's take a look for instance what this snippet would look like:

import dis
import textwrap

dis.disco(compile(textwrap.dedent("""\
    a=256
    a is 256
    a=257
    a is 257
    a=257;a is 257
    a=12345;a is 12345\
"""), '', 'exec'))

the output would be:

  1           0 LOAD_CONST               0 (256)
              2 STORE_NAME               0 (a)

  2           4 LOAD_NAME                0 (a)
              6 LOAD_CONST               0 (256)
              8 COMPARE_OP               8 (is)
             10 POP_TOP

  3          12 LOAD_CONST               1 (257)
             14 STORE_NAME               0 (a)

  4          16 LOAD_NAME                0 (a)
             18 LOAD_CONST               1 (257)
             20 COMPARE_OP               8 (is)
             22 POP_TOP

  5          24 LOAD_CONST               1 (257)
             26 STORE_NAME               0 (a)
             28 LOAD_NAME                0 (a)
             30 LOAD_CONST               1 (257)
             32 COMPARE_OP               8 (is)
             34 POP_TOP

  6          36 LOAD_CONST               2 (12345)
             38 STORE_NAME               0 (a)
             40 LOAD_NAME                0 (a)
             42 LOAD_CONST               2 (12345)
             44 COMPARE_OP               8 (is)
             46 POP_TOP
             48 LOAD_CONST               3 (None)
             50 RETURN_VALUE

As we can see in this case the asm output doesn't tell us very much, we can see than lines 3-4 are basically the "same" instructions than line 5. So my recommendation would be once again to use id() smartly so you'll know what's is will compare. In case you want to know exactly the type of optimizations cpython is doing I'm afraid you'd need to dig out in its source code

The question is, _why_ do the IDs behave like this? Why executing the two statements separately gives different IDs, but executing them as the same line produces equal IDs? — ForceBru, Jun 15 '18 at 20:00
Something you might want to look into: `a = 257; a is 257` in 3.6.4 returns True when I run it on Windows, but False when run in a Jupyter notebook — bphi, Jun 15 '18 at 20:18
@bphi That's interesting and I guess the generated asm in a Jupyter notebook will be the same than posted on my answer, right? — BPL, Jun 15 '18 at 20:20
@BPL dis.disco gives the same output, but 3rd and 4th examples return False instead of True — bphi, Jun 15 '18 at 20:26
@bphi Yeah, makes sense. I've compared the asm generated by 2.7.12 / 3.6.4 (x86) && 3.6.4 (x64) , all of them cpython implementations on windows and both 3.6 will provide the asm output, only the 2.7.12 will differ... but just in the opcodes size. Which is explained in the docs --> "Changed in version 3.6: Use 2 bytes for each instruction. Previously the number of bytes varied by instruction." — BPL, Jun 15 '18 at 20:27
For as bad as it may seem, what I wrote in my answer appears correct and proved by these differences. Every single combination of version, operating system and distro, may result in different results when it comes to these differences. We have found a flaw of Python. Congratz gentlemen. — Attersson, Jun 15 '18 at 20:31
@bphi Well, my answer still remains valid. To check that, just run the first snippet provided on my answer. The key here is using smartly the `id()` function and check what your python implementation is giving you. But if you really want to know what's really going on, dig further in the python implementation source code. — BPL, Jun 15 '18 at 20:31

Uncommon behaviour of IS operator in python

4 Answers4

Related