0

So, let's go through what we know:

  1. The is operator compares identity, not value, as opposed to the == operator.
  2. Python interns string literals, so "hello" is "hello" is True.

So what I don't understand is this behavior:

>>>'h' is input()
 h
True
>>>'hj' is input()
hj
False

A single character is counted as identical, even though they're not both string literals, whereas a two-char string gives me the results I expect, of non-identicalness.

Since the input() function is creating a string dynamically, the result is not being interned, which is why 'hj' and dynamically created 'hj' are not identical. But why are 'h' and dynamic 'h' identical?

Does this mean Python caches/interns all strings of length 1?

temporary_user_name
  • 35,956
  • 47
  • 141
  • 220
  • @Aerovistae: Can you explain why not? – Eric Oct 13 '13 at 23:09
  • The other answer does not delve into why single character strings are identical under all circumstances, even when dynamically created, which is the question here. The other answer explains how string identity testing works in general. – temporary_user_name Oct 13 '13 at 23:11
  • 3
    The answer is "because the cpython developers felt it would be handy (and not too costly)". It's implementation defined behavior, and you should not rely upon it, since it might work differently in another interpreter, or even a different version of cpython. You should never rely on any strings from different sources having the same `id`. – Blckknght Oct 13 '13 at 23:12

2 Answers2

6

From the source code:

PyObject *
PyString_FromStringAndSize(const char *str, Py_ssize_t size)
{
    // ...
    if (size == 1 && str != NULL &&
        (op = characters[*str & UCHAR_MAX]) != NULL)
    {
        Py_INCREF(op);
        return (PyObject *)op;
    }
    // ...
}

CPython interns all single character strings in the characters array.

Eric
  • 95,302
  • 53
  • 242
  • 374
1

identical is different to equal. identical means they have the same memory address (and so of course barring a nonsense __eq__ or __cmp__ they must also be equal )

The Python language doesn't specify when strings should or should not occupy the same memory address. Since strings are immutable, implementations may choose to "intern" them as an optimisation

In pypy for example

Python 2.7.2 (1.9+dfsg-1, Jun 19 2012, 23:23:45)
[PyPy 1.9.0 with GCC 4.7.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``<fijal> I love pypy''
>>>> 'h' is raw_input()
h
False
John La Rooy
  • 295,403
  • 53
  • 369
  • 502