0

I thought I understood Python names and immutable objects such as strings until I got this unexpected behaviour in a Jupyter notebook. Then I noticed the same code has different result when you run at as a Python script file (.py) from the command line.

  1. Executing as .py script (using Python 2.7.12)

Script file:

a, b = 'my text', 'my text'

print id(a), id(b), a is b

c = 'my text'
d = 'my text'

print id(c), id(d), c is d

Output

4300053168 4300053168 True
4300053168 4300053168 True

As I expected - Python does not make copies of strings. All names point to the same object.

  1. Interpreting in interactive iPython (version 2.7.12)

If I enter the exact same code above into an iPython interactive shell or a Jupyter notebook cell I get output like this

4361310096 4361310096 True
4361509168 4361509648 False

In the second case, Python has created two new objects to represent 'my text'.

The reason for this post is that I am developing code in the notebook that uses identity tests such as a is 'my text' (rather than a == 'my text'). I thought this would be a very efficient, yet readable way to achieve what I want to achieve. Obviously, for this to work consistently, I need to ensure that there are no duplicates of each string literal.

Bill
  • 10,323
  • 10
  • 62
  • 85
  • 1
    But basically, `a, b = 'my text', 'my text'` is not guaranteed to create a single `'my text'` string object, and it depends on rather cryptic string interning implementation details. You should *never* use `is` to check for string equality. – juanpa.arrivillaga May 16 '17 at 21:19
  • And actually, by the normal python behavior you wouldn't expect this to make two identical objects, so `a, b = (1,2), (1,2)` and check `a is b` is `False`, and the fact that it is `True` sometimes with strings is what should be surprising. Essentially, in some cases the CPython interpreter can optimize and intern string *literals*, but this is an implementation detail that should not be relied upon. See [this](http://stackoverflow.com/a/1504848/5014455) answer in the dupe target. – juanpa.arrivillaga May 16 '17 at 21:22
  • My understanding is this is due to interned strings. The explanation I found helpful was the second response a linked question. The direct link is here: http://stackoverflow.com/a/1504848/1772166. Additionally, the `intern` function may be of assistance, per this link: http://stackoverflow.com/a/1504870/1772166 – khan May 16 '17 at 21:25
  • Thanks everyone. That's a shame as the identity check is so much faster than a string comparison (20 times in the quick test I did). Can you at least rely on identical string literals being the same object in a script file and thus use identity checking or is it better to avoid identify checks completely? – Bill May 16 '17 at 21:34

0 Answers0