6

I am new at python and I'm currently exploring some of its core functionalities.

Could you explain me why the following example always return false in case of a string with special characters:

>>> a="x"
>>> b="x"
>>> a is b
True
>>> a="xxx"
>>> b="xxx"
>>> a is b
True
>>> a="xü"
>>> b="xü"
>>> a is b
False
>>> a="ü"
>>> b="ü"
>>> a is b
True
>>> #strange: with one special character it works as expected

I understand that the storage positions are different for strings with special characters on each assignment, I already checked it with the id() function but for which reason python handles strings in this unconsistent way?

NiMeDia
  • 995
  • 1
  • 15
  • 27

2 Answers2

3

Python (the reference implementation at least) has a cache for small integers and strings. I guess unicode strings outside the ASCII range are bigger than the cache threshold (internally unicode is stored using 16 or 32 bit wide characters, UCS-2 or UCS-4) and so they are not cached.

[edit]

Found a more complete answer at: About the changing id of a Python immutable string

Se also: http://www.laurentluce.com/posts/python-string-objects-implementation/

Community
  • 1
  • 1
Paulo Scardine
  • 73,447
  • 11
  • 124
  • 153
0

With is you're not testing equality between strings, you're testing equality between objects which is resolved through pointers. So your code:

>>> a="x"
>>> b="x"
>>> a is b
True

is not asking "are a and b the same character?", its asking "are a and b the same object?". Since there's a small object cache (for small integers and one byte strings, as has been said before), the answer is "yes, both variables refer to the same object in memory, the x character small object".

When you work with an object that is not eligible for the cache, as in:

>>> a="xü"
>>> b="xü"
>>> a is b
False

what is going on is that a and b now refer to different objects in memory, so the is operator resolves to false (a and b do not point to the same object!).

If the idea is comparing strings, you should use the == operator instead of is.

bconstanzo
  • 566
  • 4
  • 10
  • `>>> a="xxx" >>> b="xxx" >>> a is b True` What about this? – yoopoo Aug 07 '14 at 14:36
  • In my tests, strings with ASCII characters up to 20 chars in length evaluate to the same result using is. That would be the cache working to avoid creating a new object for a string that is cacheable. YMMV, I work with a Windows 8 64 bit Python 2.7.6 64 bit. – bconstanzo Aug 07 '14 at 15:12
  • Be aware that GC may behave in distinct ways if your test is running on the REPL or as a script. – Paulo Scardine Aug 08 '14 at 03:11