I thought I understood Python names and immutable objects such as strings until I got this unexpected behaviour in a Jupyter notebook. Then I noticed the same code has different result when you run at as a Python script file (.py) from the command line.
- Executing as
.py
script (using Python 2.7.12)
Script file:
a, b = 'my text', 'my text'
print id(a), id(b), a is b
c = 'my text'
d = 'my text'
print id(c), id(d), c is d
Output
4300053168 4300053168 True
4300053168 4300053168 True
As I expected - Python does not make copies of strings. All names point to the same object.
- Interpreting in interactive iPython (version 2.7.12)
If I enter the exact same code above into an iPython interactive shell or a Jupyter notebook cell I get output like this
4361310096 4361310096 True
4361509168 4361509648 False
In the second case, Python has created two new objects to represent 'my text'
.
The reason for this post is that I am developing code in the notebook that uses identity tests such as a is 'my text'
(rather than a == 'my text'
). I thought this would be a very efficient, yet readable way to achieve what I want to achieve. Obviously, for this to work consistently, I need to ensure that there are no duplicates of each string literal.