2

I read somewhere (an SO post, I think, and probably somewhere else, too), that Python automatically references single character strings, so not only does 'a' == 'a', but 'a' is 'a'.

However, I can't remember reading if this is guaranteed behavior in Python, or is it just implementation specific?

Bonus points for official sources.

ThinkingStiff
  • 64,767
  • 30
  • 146
  • 239
Wayne Werner
  • 49,299
  • 29
  • 200
  • 290
  • see also: http://stackoverflow.com/questions/1504717/python-vs-is-comparing-strings-is-fails-sometimes-why and http://stackoverflow.com/questions/2987958/how-is-the-is-keyword-implemented-in-python – kriss Jul 22 '10 at 20:44
  • 3
    And why in the world do you need to know that? – Nas Banov Jul 22 '10 at 20:47
  • Curiosity mainly. And the desire to be able to tell other folks (especially beginners) expressly where it's mentioned. – Wayne Werner Jul 22 '10 at 21:01
  • 2
    You fail to understand that comparing strings (or ints) with `is` **is not desired behavior**. It's a glitch that is acceptable when it makes the implementation more efficient. – Jochen Ritzel Jul 22 '10 at 21:15
  • I guess the reason that it seems that it should be desired is that if strings are immutable, then there's no reason to store more than one copy of the string. I'm sure this is a gross error in simplification, but it makes sense to *me*, at least ;) – Wayne Werner Jul 23 '10 at 12:30

2 Answers2

14

It's implementation specific. It's difficult to tell, because (as the reference says):

... for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed.

The interpreter's pretty good about ensuring they're identical, but it doesn't always work:

x = u'a'
y = u'abc'[:1]
print x == y, x is y

Run on CPython 2.6, this gives True False.

Chris B.
  • 85,731
  • 25
  • 98
  • 139
  • 5
    I don't know for sure, but I would never rely on it. –  Jul 22 '10 at 20:30
  • That's why I'm looking for an official (or at least very reputable) source :P – Wayne Werner Jul 22 '10 at 20:34
  • 2
    The example given in this answer proves that you *can't* rely on it. +1 – Mark Ransom Jul 22 '10 at 20:49
  • Interesting - it gives False for unicode, but not for regular strings (at least in my interactive interpreter). But I find this explanation satisfactory. – Wayne Werner Jul 22 '10 at 21:02
  • The CPython implementation has been specialcasing single-character strings for a long time, since it's relatively cheap and easy to do (as there are only 256 of them.) For unicode strings, there's 1114111 distinct single-character strings, so it doesn't do that. – Thomas Wouters Jul 22 '10 at 21:09
6

It is all implementation defined.

The documentation for intern says: "Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys."

That means that anything that could be a name and which is known at compile time is likely (but not guaranteed) to be the same as any other occurrences of the same name.

Other strings aren't stated to be interned. Constant strings appearing in the same compilation unit are folded together (but that is also just an implementation detail) so you get:

>>> a = '!'
>>> a is '!'
False
>>> a = 'a'
>>> a is 'a'
True
>>>

The string that contains an identifier is interned so even in different compilations you get the same string. The string that is not an identifier is only shared when in the same compilation unit:

>>> '!' is '!'
True
Duncan
  • 92,073
  • 11
  • 122
  • 156