I tried this in Python 2.7:
In [1]: s = 'abc'
In [2]: s is 'abc'
Out[2]: True
In [3]: s = '.abc'
In [4]: s is '.abc'
Out[4]: False
Why does the second test return False?
I tried this in Python 2.7:
In [1]: s = 'abc'
In [2]: s is 'abc'
Out[2]: True
In [3]: s = '.abc'
In [4]: s is '.abc'
Out[4]: False
Why does the second test return False?
The answer is: because python tries to detect which strings look like identifiers, and interns them automatically, in order to make string comparison O(1) on them.
In the python interpreter, there is the following function:
#define NAME_CHARS \
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz"
/* all_name_chars(s): true iff all chars in s are valid NAME_CHARS */
static int
all_name_chars(unsigned char *s)
{
static char ok_name_char[256];
static unsigned char *name_chars = (unsigned char *)NAME_CHARS;
if (ok_name_char[*name_chars] == 0) {
unsigned char *p;
for (p = name_chars; *p; p++)
ok_name_char[*p] = 1;
}
while (*s) {
if (ok_name_char[*s++] == 0)
return 0;
}
return 1;
}
It's called on all the string literals in your code, to detect if they look like identifiers and should be interned.
It often happens that programs use strings as identifiers, for instance as keys in a dictionary, or as a flag of some sort. It's important that comparison of such strings can be done very fast, by just checking the identity of objects. Thus, python detects all such strings in your code, and make them point to unique objects. That's why your first comparison returns true.
However interning strings takes memory, so python tries not to intern strings that look like natural language or text. So if a string contains any character that is not a letter, a number, or a _
, then it's not interned.
You can find more information about this here: http://guilload.com/python-string-interning/
It tests if two variables point the same object, not if two variables have the same value.
Source: https://stackoverflow.com/a/13650309/4085019
To understand how it works, let's use id()
function. For the same case you have provided:
>>> s = 'abc'
>>> id(s)
140297255717024
>>> id('abc')
140297255717024
>>> s is 'abc'
True
Here, both the s
ad 'abc'
point to the same object. Hence, s is a
returns True
. Similarly when you test for .abc
:
>>> s = '.abc'
>>> id(s)
140297254722272
>>> id('.abc')
140297254722368
>>> s is '.abc'
False
>>>
So, the bottom line is you aren't doing s == value
rather id(s) == id(value)
. Better to use a string comparision than is
operator.
Update: Why does it work with s = 'abc'
case?
So there seems to be a way in which python assigns the id
. Whenever there is a special character such as .
, @
, /
, etc it is assigning a new id
for the variable. A way to control this is by using intern
operator. For the above case:
>>> s = intern('.abc')
>>> s is '.abc'
False
>>> s is intern('.abc')
True
Theoretically, intern
and is
operator performs better than string comparison because it is id
comparision.