1

Something I'm not quite understanding with strings in python

>>> a = "dog"
>>> id(a)
140438787418232
>>> id("d" + "o" + "g")
140438787418232
>>> b = "dog"
>>> id(b)
140438787418232

That behaved how I'd expect it to, however if I use a string with whitespace in it...

>>> a = "a dog"
>>> id(a)
140438787452384
>>> id("a" + " " + "d" + "o" + "g")
140438787452288
>>> b = "a dog"
>>> id(b)
140438787452144

The interpreter doesn't resolve the identical strings to the same memory address this time round. Why is that?

ptr
  • 3,292
  • 2
  • 24
  • 48
  • this is in python 2.7 – ptr Jun 25 '14 at 15:40
  • Your first session creates code-object constants (Python will optimise string concatenation if the result is shorter than 20 characters); `"d" + "o" + "g"` results in a `"dog"` string being added to the code constants. That is then interned (as it is a valid identifier string). – Martijn Pieters Jun 25 '14 at 15:41
  • In your second session, the resulting strings are not valid identifiers, so they are not interned. – Martijn Pieters Jun 25 '14 at 15:42
  • 1
    Could you expand on what you mean by a "valid identifier"? (as an answer that I can accept :)) – ptr Jun 25 '14 at 15:45
  • @PeteTinkler: I already cover that in my other answer. This is now closed, I cannot add an answer to your question here. A valid identifier is [defined in the Python reference documentation](https://docs.python.org/2/reference/lexical_analysis.html#identifiers), although for interned strings the restriction on the initial character not being a digit is lifted. – Martijn Pieters Jun 25 '14 at 15:50
  • 1
    @PeteTinkler: so any string literal in your code that uses only letters, digits and underscores is interned in CPython. – Martijn Pieters Jun 25 '14 at 15:50

0 Answers0