I've tried to understand when Python strings are identical (aka sharing the same memory location). However during my tests, there seems to be no obvious explanation when two string variables that are equal share the same memory:
import sys
print(sys.version) # 3.4.3
# Example 1
s1 = "Hello"
s2 = "Hello"
print(id(s1) == id(s2)) # True
# Example 2
s1 = "Hello" * 3
s2 = "Hello" * 3
print(id(s1) == id(s2)) # True
# Example 3
i = 3
s1 = "Hello" * i
s2 = "Hello" * i
print(id(s1) == id(s2)) # False
# Example 4
s1 = "HelloHelloHelloHelloHello"
s2 = "HelloHelloHelloHelloHello"
print(id(s1) == id(s2)) # True
# Example 5
s1 = "Hello" * 5
s2 = "Hello" * 5
print(id(s1) == id(s2)) # False
Strings are immutable, and as far as I know Python tries to re-use existing immutable objects, by having other variables point to them instead of creating new objects in memory with the same value.
With this in mind, it seems obvious that Example 1
returns True
.
It's still obvious (to me) that Example 2
returns True
.
It's not obvious to me, that Example 3
returns False
- am I not doing the same thing as in Example 2
?!?
I stumbled upon this SO question:
Why does comparing strings in Python using either '==' or 'is' sometimes produce a different result?
and read through http://guilload.com/python-string-interning/ (though I probably didn't understand it all) and thougt - okay, maybe "interned" strings depend on the length, so I used HelloHelloHelloHelloHello
in Example 4
. The result was True
.
And what the puzzled me, was doing the same as in Example 2
just with a bigger number (but it would effectively return the same string as Example 4
) - however this time the result was False
?!?
I have really no idea how Python decides whether or not to use the same memory object, or when to create a new one.
Are the any official sources that can explain this behavior?