From the documentation:
id(object)
Return the “identity” of an object. This is an integer
which is guaranteed to be unique and constant for this object during
its lifetime. Two objects with non-overlapping lifetimes may have the
same id() value.
CPython implementation detail: This is the address of the object in memory.
The method id()
is, in this case, the memory address of the stored string as the source code shows us:
static PyObject *
builtin_id(PyModuleDef *self, PyObject *v)
/*[clinic end generated code: output=0aa640785f697f65 input=5a534136419631f4]*/
{
PyObject *id = PyLong_FromVoidPtr(v);
if (id && PySys_Audit("builtins.id", "O", id) < 0) {
Py_DECREF(id);
return NULL;
}
return id;
}
What happens is that the end and begin of life of the two objects do indeed overlap.
Python guarantees the immutability of strings only as long as they are alive.
As the article suggested by @kris shows:
import _ctypes
a = "abcd"
a += "e"
before_f_id = id(a)
a += "f"
print(a)
print( _ctypes.PyObj_FromPtr(before_f_id) ) # prints: "abcdef"
the string a
ended is life and it is not guaranteed to be retrievable given is memory location, in fact the above example shows that it is reused for the new variable.
We can take a look at how it is implemented under the hood in the unicode_concatenate
method looking at the last lines of codes:
res = v;
PyUnicode_Append(&res, w);
return res;
where v
and w
are those in the expression: v += w
The method PyUnicode_Append
is in fact trying to reuse the same memory location for the new object, in detail in PyUnicode_Append
:
PyUnicode_Append(PyObject **p_left, PyObject *right):
...
new_len = left_len + right_len;
if (unicode_modifiable(left)
...
{
/* append inplace */
if (unicode_resize(p_left, new_len) != 0)
goto error;
/* copy 'right' into the newly allocated area of 'left' */
_PyUnicode_FastCopyCharacters(*p_left, left_len, right, 0, right_len);
}