9

For example, is the string var1 = 'ROB' stored as 3 memory locations R, O and B each with its own address and the variable var1 points to the memory location R? Then how does it point to O and B?

And do other strings – for example: var2 = 'BOB' – point to the same B and O in memory that var1 refers to?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
variable
  • 8,262
  • 9
  • 95
  • 215
  • 1
    Do you have a specific programming problem which you want to solve by knowing this? Or are you asking this just out of curiosity? – mkrieger1 Jul 12 '19 at 08:03
  • @mkrieger1 - I am trying to understand the internals – variable Jul 12 '19 at 08:10
  • Maybe duplicate of https://stackoverflow.com/questions/19721002/is-a-variable-the-name-the-value-or-the-memory-location – betontalpfa Jul 12 '19 at 08:19
  • 1
    @variable read this: http://foobarnbaz.com/2012/07/08/understanding-python-variables/ – Zaraki Kenpachi Jul 12 '19 at 08:29
  • Are you asking about the CPython implementation? – mkrieger1 Jul 12 '19 at 08:39
  • And if so, about which particular version? – mkrieger1 Jul 12 '19 at 08:55
  • 1
    @mkrieger1: At this point, I treat all questions as targeting supported versions of Python 3 unless specified otherwise. The answer does differ depending on whether you're on Py2, Py3.2 or lower, or Py3.3 or higher, but 3.4 and below is no longer supported, and Py2 will stop being supported by the end of this year, so Python 3.5+ is really all that matters for new learners right now. – ShadowRanger Jul 12 '19 at 14:00

2 Answers2

8

How strings are stored is an implementation detail, but in practice, on the CPython reference interpreter, they're stored as a C-style array of characters. So if the R is at address x, then O is at x+1 (or +2 or +4, depending on the largest ordinal value in the string), and B is at x+2 (or +4 or +8). Because the letters are stored consecutively, knowing where R is (and a flag in the str that says how big each character's storage is) is enough to locate O and B.

'BOB' is at a completely different address, y, and its O and B are contiguous as well. The OB in 'ROB' is utterly unrelated to the OB in 'BOB'.

There is a confusing aspect to this. If you index into the strings, and check the id of the result, it will seem like 'O' has the same address in both strings. But that's only because:

  1. Indexing into a string returns a new string, unrelated to the one being indexed, and
  2. CPython caches length one strings in the latin-1 range, so 'O' is a singleton (no matter how you make it, you get back the cached string)

I'll note that the actual str internals in modern Python are even more complicated than I covered above; a single string might store the same data in up to three different encodings in the same object (the canonical form, and cached version(s) for working with specific Python C APIs). It's not visible from the Python level aside from checking the size with sys.getsizeof though, so it's not worth worrying about in general.

If you really want to head off into the weeds, feel free to read PEP 393: Flexible String Representation which elaborates on the internals of the new str object structure adopted in CPython 3.3.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • Hi, you said "If you index into the strings, and check the id of the result, it will seem like 'O' has the same address in both strings" - is this same concept as string interning? – variable Jul 13 '19 at 05:03
  • @variable: No. The [short latin-1 string cache](https://github.com/python/cpython/blob/3.7/Objects/unicodeobject.c#L263) is an implementation detail separate from [interning](https://github.com/python/cpython/blob/3.7/Objects/unicodeobject.c#L197). It's sort of the same concept, in the sense of caching is involved, but the implementation is unrelated. – ShadowRanger Jul 13 '19 at 13:58
2

This is only a partial answer:

  • var1 is a name that refers to a string object 'ROB'.
  • var2 is a name that refers to another string object 'BOB'.

How a string object stores the individual characters, and whether different string objects share the same memory, I cannot answer now in more detail than "sometimes" and "it depends". It has to do with string interning, which may be used.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65