3

I tried this in Python 2.7:

In [1]: s = 'abc'

In [2]: s is 'abc'
Out[2]: True


In [3]: s = '.abc'

In [4]: s is '.abc'
Out[4]: False

Why does the second test return False?

Gaut
  • 1,255
  • 2
  • 15
  • 33
  • 4
    You're asking for an implementation detail of string interning. The spec doesn't guarantee _either_ of those will return `True`. Don't use it or rely on it. – ShadowRanger Nov 13 '16 at 17:47
  • I used to think it was safe for strings. Thanks – Gaut Nov 13 '16 at 17:50

2 Answers2

3

The answer is: because python tries to detect which strings look like identifiers, and interns them automatically, in order to make string comparison O(1) on them.

In the python interpreter, there is the following function:

#define NAME_CHARS \
    "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz"

/* all_name_chars(s): true iff all chars in s are valid NAME_CHARS */

static int
all_name_chars(unsigned char *s)
{
    static char ok_name_char[256];
    static unsigned char *name_chars = (unsigned char *)NAME_CHARS;

    if (ok_name_char[*name_chars] == 0) {
        unsigned char *p;
        for (p = name_chars; *p; p++)
            ok_name_char[*p] = 1;
    }
    while (*s) {
        if (ok_name_char[*s++] == 0)
            return 0;
    }
    return 1;
}

It's called on all the string literals in your code, to detect if they look like identifiers and should be interned.

It often happens that programs use strings as identifiers, for instance as keys in a dictionary, or as a flag of some sort. It's important that comparison of such strings can be done very fast, by just checking the identity of objects. Thus, python detects all such strings in your code, and make them point to unique objects. That's why your first comparison returns true.

However interning strings takes memory, so python tries not to intern strings that look like natural language or text. So if a string contains any character that is not a letter, a number, or a _, then it's not interned.

You can find more information about this here: http://guilload.com/python-string-interning/

lovasoa
  • 6,419
  • 1
  • 35
  • 45
1

It tests if two variables point the same object, not if two variables have the same value.

Source: https://stackoverflow.com/a/13650309/4085019

To understand how it works, let's use id() function. For the same case you have provided:

>>> s = 'abc'
>>> id(s)
140297255717024
>>> id('abc')
140297255717024
>>> s is 'abc'
True

Here, both the s ad 'abc' point to the same object. Hence, s is a returns True. Similarly when you test for .abc:

>>> s = '.abc'
>>> id(s)
140297254722272
>>> id('.abc')
140297254722368
>>> s is '.abc'
False
>>> 

So, the bottom line is you aren't doing s == value rather id(s) == id(value). Better to use a string comparision than is operator.

Update: Why does it work with s = 'abc' case?

So there seems to be a way in which python assigns the id. Whenever there is a special character such as ., @, /, etc it is assigning a new id for the variable. A way to control this is by using intern operator. For the above case:

>>> s = intern('.abc')
>>> s is '.abc'
False
>>> s is intern('.abc')
True

Theoretically, intern and is operator performs better than string comparison because it is id comparision.

Community
  • 1
  • 1
PseudoAj
  • 5,234
  • 2
  • 17
  • 37