1

When you assign same string literal to two variables, Python only allocates one string. This is very reasonable since string is immutable object in Python.

>>> a = "Hello"
>>> b = "Hello"
>>> id(a)
4311984752
>>> id(b)
4311984752
>>> a is b
True

But the strange part is: when the string contains special character (like !), Python will allocate two strings with exact same content.

>>> a = "hi!"
>>> b = "hi!"
>>> id(a)
4328663024
>>> id(b)
4317237616
>>> a is b
False

I read about this strange behaviour from here: https://python-course.eu/python-tutorial/data-types-and-variables.php But that guide didn't elaborate why Python does this seemingly unnecessary duplicated string allocation.

My question is what's rationale behind Python's design of duplicated string allocation for string containing special character?

huocp
  • 3,898
  • 1
  • 17
  • 29
  • 4
    What should surprise you is that `a is b` **in the first example, *not the second example***. The first example is an *optimization*, an *implementation detail*. There are costs associated with implementing that optimization, and the python developers decided it wasn't worth it for strings with special characters. – juanpa.arrivillaga Feb 10 '22 at 02:38
  • 1
    https://www.tutorialspoint.com/How-can-we-change-the-id-of-an-immutable-string-in-Python this says for any string that contains values beyond just pure alphabets like digits or special characters might reflect a change in the id. hth – Varadharajan Raghavendran Feb 10 '22 at 02:40
  • 1
    Implementation details are, well, implementation details. They're not part of the language specification, and they're subject to change without notice. If you as a programmer need to know about them, something is generally wrong. If you're writing code that _depends_ on them, something is **extremely** wrong (and your code is likely to break in future language releases and/or be incompatible with alternate language implementations -- think Jython/IronPython/PyPy/etc). – Charles Duffy Feb 10 '22 at 02:40
  • "But the strange part is: when the string contains special character (like !), Python will allocate two strings with exact same content." See, *this isn't the strange part at all*. This is what you should *expect* to happen. – juanpa.arrivillaga Feb 10 '22 at 02:42
  • In any case, note, there are several things going on, potentially, in your code. Immutab literals with equivalent values are usually made the same object by the *compiler* if they occur in the same code block, so consider: `def foo(): x = 'hi!'; y = 'hi!'; print(x is y)` note, since you are in a repl, your `a = 'hi!'` and `b = 'hi!'` are in separate code blocks, so they didn't undergo "constant folding"., but on top of that, strings are often interned for other reasons (single character strings, strings that are attributes of classes, strings consisting of soley ascii code points). – juanpa.arrivillaga Feb 10 '22 at 02:49
  • Thanks all! I agree it's an implementation detail. I asked simply out of curiosity. – huocp Feb 10 '22 at 02:54
  • But **more importantly** these are all implementation details, that while interesting, shouldn't affect your code at all since they are not guaranteed, and even if you think you know the rules from reading the source code, there can be edge cases where they don't apply! Consider the well known fact that small int's are cached in CPython, [-5, 255], but you cannot rely on this! There are edge cases that escaped this optimization... `pow(10, 30, 10**30-1) is 1` for example... – juanpa.arrivillaga Feb 10 '22 at 02:54

0 Answers0