7

I am from c background and a beginner in python. I want to know how strings are actually stored in memory in case of python.

I did something like

s="foo"

id(s)=140542718184424

id(s[0])= 140542719027040
id(s[1])= 140542718832152
id(s[2])= 140542718832152

I did not understand how each character is getting stored in memory and and why id of s is not equal to id of s[0] (like it use to be in c) and why id of s1 and s2 are same?

  • Related: http://stackoverflow.com/questions/16756699/is-operator-behaves-differently-when-comparing-strings-with-spaces – Ashwini Chaudhary Oct 07 '13 at 11:50
  • 2
    First, in general ``id`` is not an address in memory (it is that only in CPython) and you should not think of it this way. Second, ``id(s[0])`` returns the id of a newly created character object, not of a character from the string (which do not have ids of their own). – fjarri Oct 07 '13 at 11:51
  • 3
    "How string are stored in memory" is not defined by the language but by the implementation. There are currently at least 3 main implementations (CPython, IronPython, Jython). Also what `id()` returns is an object's identifier, not a memory address. The fact that some implementation uses memory addresses as identifier is just, well, an implementation detail. – bruno desthuilliers Oct 07 '13 at 11:52

3 Answers3

4

Python has no characters. Indexing into a string creates a new string, which (like every other object) promptly vanquishes if you don't keep a reference to it around. So the id()s in your example can't be compared with each other, an object's id is only unique as long as the object lives. In particular, id(s[0]) != id(s) because the former is a new (temporary) object, and id(s[1]) == id(s[2]) because after the first operand is evaluated, the first temporary string is destroyed and the second temporary string is allocated to the previously freed memory. The latter is an implementation detail and a coincidence and cannot be relied on.

Reasoning about string memory is further complicated by implementation details like small strings (along with integers, some tuples, and more) being interned, so some_str is other_str may be true for equal strings that come from different sources (e.g. from indexing into a string with different indices).

  • Note that even storing `s[1]` and `s[2]` in separate variables will give you the same id. This is likely because [strings are interned](http://en.wikipedia.org/wiki/String_interning). – poke Oct 07 '13 at 11:56
2

This article is a good reading which explains how strings are stored. Briefly:

When working with empty strings or ASCII strings of one character Python uses string interning. Interned strings act as singletons, that is, if you have two identical strings that are interned, there is only one copy of them in the memory.

Python does not UTF-8 internally to provide constant access to substrings:

s = 'hello world'
s[0]
s[7] 

both do not require to scan the string from the initial char (or, more correctly, the first substring of length 1) to the i-th position.

This is why Python uses the three kinds of internal representations for Unicode strings with 1, 2 or 4 byte(s) per char (Latin-1, UCS-2, UCS-4 encoding) and does not use the space-optimised UTF-8.

C. Claudio
  • 177
  • 13
0

This is implementation dependent, but some implementations (not only of Python, other languages too) may keep a moderate-size set of constant values around for expected frequent use. In Python's case those might be values like True, None, 'o', 1, 2, etc. This way, when one of those common values is needed, there is no overhead to create it--just refer to the existing value.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436