Why do string comparison and identity behave differently in pdb and python console

Question

I run same snippet of python code in python console and pdb, but I get different results as below:

pdb:

>>> import pdb
>>> pdb.set_trace()
(Pdb) print u'你好' == u'\u4f60\u597d'
False
(Pdb) print u'你好' is u'\u4f60\u597d'
False
(Pdb) print id(u'你好'), id(u'\u4f60\u597d')
4431713024 4431713120
(Pdb) id(u'你好')
4431713024
(Pdb) id(u'\u4f60\u597d')
4431713024

python console:

>>> print u'你好' == u'\u4f60\u597d'
True
>>> print u'你好' is u'\u4f60\u597d'
True
>>> print id(u'你好'), id(u'\u4f60\u597d')
4376711984 4376711984
>>> id(u'你好')
4376711984
>>> id(u'\u4f60\u597d')
4376711984

My python version is 2.7.13

So my questions:

1.why operators(like '==' and 'is') perform differently in two consoles.

2.In pdb, id(u'\u4f60\u597d') equals 4431713120 in

print id(u'你好'), id(u'\u4f60\u597d')

but 4431713024 in

id(u'\u4f60\u597d')

3.Why this situation does not occur in python3

Python uses string interning. `pdb` and the Python console *may* implement it differently. See https://stackoverflow.com/questions/15541404/python-string-interning?noredirect=1&lq=1 and https://en.wikipedia.org/wiki/String_interning for more general info — DeepSpace, Feb 27 '18 at 12:34
@DeepSpace That explains all the differences with the `is` comparison. But not the `==` one? — Graipher, Feb 27 '18 at 13:16

score 1 · Answer 1 · answered Feb 27 '18 at 13:40

Let's start with the is checks, because that is slightly easier to answer.

Note that when you check the ids in two separate lines both the interpreter and the debugger show the same id for both strings. This is because the first string is initialized at some address, you print its id. Then you create a new string and you use the same variable name, so there are no more references pointing to the first string. This means the first string is garbage collected and its memory is freed. The newly created string takes the first free memory space, which just happens to be the one that just became free. It therefore has the same id as the first string had (when it was alive).

When checking the ids in the same line, this happens differently, because both strings exist at the same time. Here the interpreter and the debugger differ in their behavior. The interpreter interns the string, so they are the same object and have therefore the same id, while the debugger does not. (Refer to Python string interning, as recommended by @DeepSpace in the comments, for more information on interning).

I think the root cause why not can actually be seen in the first test, u'你好' == u'\u4f60\u597d'. These two strings are represented differently in the interpreter and the debugger and therefore it cannot intern them (since the debugger thinks they are two different strings).

The debugger assigns different code points for the two string:

(Pdb) map(ord, u'你好')
[228, 189, 160, 229, 165, 189]
(Pdb) map(ord, u'\u4f60\u597d')
[20320, 22909]

While the interpreter does not:

>>> map(ord, u'你好')
[20320, 22909]
>>> map(ord, u'\u4f60\u597d')
[20320, 22909]

As to why not, that question needs to be answered by someone else.

But when I put 'abc', 'ab', u'abc', u'ab' in pdb id function, the first two got different result but the last two got same result, it seems pdb has a special approach on unicode — want, Feb 28 '18 at 02:45
@want That's probably because the normal ASCII characters can only be represented one way. Those Chinese characters, however, can be represented as one symbol or as a sum of different modifiers. — Graipher, Feb 28 '18 at 06:04

Why do string comparison and identity behave differently in pdb and python console

1 Answers1