Let us first investigate what id()
does:
Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object in memory.
OK, so, in general, when two names are pointing to the exact same object, they will have the same id()
.
Why then do we have this?
str1 = 'aaa'
str2 = 'aaa'
print(id(str1) == id(str2)) # True
That is because, in CPython (the C reference implementation of Python), strings are cached in a hashtable (for performance reasons) and it is cheaper to have str1
and str2
to point to the same memory.
Note that this can be done without much unexpected behaviors because strings are immutable in Python.
However, this mechanism is triggered only for strings that appear in full in the interpreter, e.g.:
for i in range(5):
a = eval('"' + 'a' * i + '"')
b = eval('"' + 'a' * i + '"')
print(id(a) == id(b), a, b)
True
True a a
True aa aa
True aaa aaa
True aaaa aaaa
Any mechanism that creates a str
dynamically within the interpreter (i.e. aside of eval()
), is outside of this caching, like your example, or:
a = 'aaa'
b = a[1:]
c = 'aa'
print(id(b) == id(c))
# False
print(id(b) == id(a[1:]))
# False
For further reference, the internal representation of strings in Python is described in more detail in PEP 393.