How python memory manage string object?

Question

for count in range(5):
    str1 = 'a' * count
    str2 = 'a' * count
    print(id(str1) == id(str2))

Output:
    True
    True
    False
    False
    False

Why we are getting False, Because

str1 = 'aaa'
str2 = 'aaa'
print(id(str1) == id(str2)) # True

Anyone explain this python memory behaviour.

Why does it matter? If you want to test for string equality, use the `==` operator. What happens here is some kind of memory usage optimisation that's probably specific to CPython — ForceBru, Oct 19 '19 at 15:41
thanks, but which type of memory optimisation CPython is doing, any idea ? — Rajeev, Oct 19 '19 at 15:45
I'm really curious about this question as well. Since str1 and str2 are different strings, shouldn't they have different ID's? Or because they're pointing to the same thing, they have the same memory address? — wolfbagel, Oct 19 '19 at 15:58
This also happens in integers up to `256`. Probably some sort of optimization, not A big deal. — ori6151, Oct 19 '19 at 16:25

score 5 · Answer 1 · answered Oct 19 '19 at 16:26

5

Let us first investigate what id() does:

Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

CPython implementation detail: This is the address of the object in memory.

OK, so, in general, when two names are pointing to the exact same object, they will have the same id().

Why then do we have this?

str1 = 'aaa'
str2 = 'aaa'
print(id(str1) == id(str2)) # True

That is because, in CPython (the C reference implementation of Python), strings are cached in a hashtable (for performance reasons) and it is cheaper to have str1 and str2 to point to the same memory. Note that this can be done without much unexpected behaviors because strings are immutable in Python. However, this mechanism is triggered only for strings that appear in full in the interpreter, e.g.:

for i in range(5):
    a = eval('"' + 'a' * i + '"')
    b = eval('"' + 'a' * i + '"')
    print(id(a) == id(b), a, b)

True  
True a a
True aa aa
True aaa aaa
True aaaa aaaa

Any mechanism that creates a str dynamically within the interpreter (i.e. aside of eval()), is outside of this caching, like your example, or:

a = 'aaa'
b = a[1:]
c = 'aa'
print(id(b) == id(c))
# False
print(id(b) == id(a[1:]))
# False

For further reference, the internal representation of strings in Python is described in more detail in PEP 393.

answered Oct 19 '19 at 16:26

norok2

25,683
4
73
99

This is a very good answer indeed! It may perhaps be worth noting a little bit about references in general in Python (such as the fact that `a = '456'` and `b = '456'` will point to the same pointer, while `a = 456` and `b = 456` will not). – soyapencil Oct 19 '19 at 16:29
1

For `int`s, this is triggered until `256` – norok2 Oct 19 '19 at 16:34
@norok2 I understand the string intern and caching but I still don't fully understand the original loop. Why is it from iteration 3 onwards they evaluate to false? – Cathal Cronin Oct 19 '19 at 16:46
@CathalCronin This happens when count is `0` or `1` for which there is shortcircuit and no new string are actually generated – norok2 Oct 19 '19 at 18:46
@norok2 String interning operates on strings of length 20 or less I thought? Is it because its within a loop that this gets shortcircuited to a length of 0 or 1? – Cathal Cronin Oct 19 '19 at 18:54
2

@CathalCronin no, the shortcircuit happens on `s * 0` or `s * 1` for any string, at any point in the code. Instead of generating a dynamic string, those two operations gets a pointer to the empty string and `s` itself, respectively. – norok2 Oct 19 '19 at 19:05
@norok2 Okay, is that explained in the documentation somewhere? Do you have a link that explains that in more detail? – Cathal Cronin Oct 19 '19 at 19:40
2

https://stackoverflow.com/q/24245324/5769463 @CathalCronin in a nutshell: strings are interned at compile time if constant folding happens (the case for length 1 strings) but not for strings created at run time (the others) – ead Oct 20 '19 at 13:52

How python memory manage string object?

1 Answers1