1

I am trying to understand when python interns constants and when it doesn't. I'm using python 3.8.5 for this question. I understand that after python 3.7 python changed from peephole optimization to the AST optimizer and that the longer strings are now interned.

I thought I had this all under control until I tried running the same commands - in the same conda enviornment and with the same version of python -- in a jupyter notebook and in the interactive interpreter.

>>> sys.version
'3.8.5 (default, Sep  4 2020, 02:22:02) \n[Clang 10.0.0 ]'

>>> "AvocadoAvocadoAvocadoAvocadoAvocado !" is "AvocadoAvocadoAvocadoAvocadoAvocado !"
<stdin>:1: SyntaxWarning: "is" with a literal. Did you mean "=="?
False

And in Jupyter Notebooks

import sys
sys.version

gives

'3.8.5 (default, Sep  4 2020, 02:22:02) \n[Clang 10.0.0 ]'
"AvocadoAvocadoAvocadoAvocadoAvocado !" is "AvocadoAvocadoAvocadoAvocadoAvocado !"

gives

<>:1: SyntaxWarning: "is" with a literal. Did you mean "=="?
<>:1: SyntaxWarning: "is" with a literal. Did you mean "=="?
<ipython-input-6-2414f185945a>:1: SyntaxWarning: "is" with a literal. Did you mean "=="?
  "AvocadoAvocadoAvocadoAvocadoAvocado !" is "AvocadoAvocadoAvocadoAvocadoAvocado !"
True

I can't figure out why the result is False in the interpreter and True in the notebook. I also wonder why there are three warnings in the notebook and only one in the interpreter and whether that holds any clues to why the results are different.

Why do I get False in the interactive interpreter and a True in Jupyter Notebooks?

B. Bogart
  • 998
  • 6
  • 15
  • See https://stackoverflow.com/questions/1504717/why-does-comparing-strings-using-either-or-is-sometimes-produce-a-differe or https://stackoverflow.com/questions/26595/is-there-any-difference-between-foo-is-none-and-foo-none – Boris Verkhovskiy Nov 18 '20 at 00:55
  • 1
    @Boris. I understand that fully, and how interning works in Jupyter Notebooks and how to force static values to be interned. What I don't understand is why the same command behaves differently in the interpreter and Jupyter Notebooks. – B. Bogart Nov 18 '20 at 00:57
  • 1
    I cannot reproduce this behavior with 3.8.2. If the strings are indeed equal, I get `True` even with fairly long strings (over 200 chars). – Aaron Nov 18 '20 at 01:54
  • 2
    [This](https://github.com/satwikkansal/wtfpython/issues/100#issuecomment-549171287) looks very likely to be the answer for the different result with ipython (basically comes down to how much code is sent to the optimizer at a time before it makes it to the point of being a PyObject) – Aaron Nov 18 '20 at 02:07

1 Answers1

2

There are many things in python (and other languages) which may seem like they work, but go against the definition of how they're supposed to work. Object identity is one of those things. The purpose of the is keyword is never to compare the value of something, but to test if two variables refer to the same underlying object. While it may seem to make sense that if they're the same object then the value must also be equal, but that statement is not true at all in reverse. This will sometimes work (as you have found) without throwing an exception, however it is not a defined feature of python. These are things which are "implementation dependent", and are never guaranteed to give correct or even stable results.

Apparently ipython does not submit chunks of code to the cpython binary in the same way it is submitted via the built-in REPL: https://github.com/satwikkansal/wtfpython/issues/100#issuecomment-549171287

I would assume this is to reduce the number of messages the front-end has to send to the kernel when sending multiple lines of code. I would expect the behavior of executing a .py file from the command line would better match the results you get from ipython in this regard.

Along these lines, it is sometimes possible to recover objects after deletion but before garbage collection because the implementation of the id function returns the memory location of the object which can be used with ctypes to construct a new PyObject. This is very much a way to introduce bugs and instability into your code. If for some reason id was switched out to a simple counter for each allocated item, (perhaps you want to protect against leaking any information about the process memory space) this would immediately break.

Aaron
  • 10,133
  • 1
  • 24
  • 40