How does python implement the indexing of strings?

Question

The Data Structures and Algorithms written by Goodrich says that the python array is to store a group of related variables one after another in a continuous area of the computer memory, so the index can be accessed directly by calculating the address.For example,if the memory address of the first element of the array is 2146, and each element occupies two bytes of memory, then the memory address of the sixth element is 2146+2*5=2156, so the computer can directly access address 2156 to get the sixth elements.

But I tried to verify it,only to found that the results didn't accord with the theory.

str1 = "example"
for i in range(1,6):
    print(id(str1[i])-id(str1[i-1]))

The output is as follows

Why does this happen, if the memory address is not continuous, how does python get its memory address through index and then access the element?

Does this answer your question? [How str implemented in python?](https://stackoverflow.com/questions/16797459/how-str-implemented-in-python) — GPhilo, Jul 23 '20 at 08:18
`id(str1[0])` is just doing `id('e')`. It's not going to tell you anything about the structure of `str1` — khelwood, Jul 23 '20 at 08:19
@rioV8 I don't think that this is necessarily the case. If it were the case then surely the following code would return a different value each time `id[n]` is printed for the same sub-string within the string - which it doesn't. ```python test = "test_string" for i in range(10): for n in range(len(test)): print(id(test[n])) print() ``` — JPI93, Jul 23 '20 at 10:02
Does this answer your question? [What is the id( ) function used for?](https://stackoverflow.com/questions/15667189/what-is-the-id-function-used-for) — Gray_Rhino, Jul 24 '20 at 01:43

Pietro Battiston · Answer 1 · 2020-07-23T10:56:40.313

The closest thing to a "Python array" I know of is a numpy.array, and indeed:


In [1]: import numpy as np                                                                                                                                                                       

In [2]: a = np.array([12, 4, 120, 24, 3, 0, 13, 13], dtype='int8')                                                                                                                               

In [3]: asint64 = a.view('int64')[0]                                                                                                                                                             

In [4]: for i in range(8): 
   ...:     print(asint64 % 2**(8*(i+1)) // 2**(8*(i))) 
   ...:                                                                                                                                                                                          
12
4
120
24
3
0
13
13

What is happening here is that you are first building an array of 8 numbers using each 8 bits; when you later ask numpy to consider them as a single 64 bit number, you get that it is composed by the 8 bit representation of each of the 8 numbers, juxtaposed. So indeed the original 8 integers were adiacent in memory.

In general, asking Python "tell me what is at this arbitrary memory position", or "tell me where exactly in memory is this item of a string or array" is... slightly less straigthforward.

EDIT: ... it is slightly less straightforward, but at least it alleviates any suspect that a.view is doing strange things, so here we are loooking at the exact positions in memory of sub-arrays of our array:

In [5]: for i in range(8): 
   ...:     print(a[i:].__array_interface__['data'][0]) 
   ...:                                                                                                                                                                                          
45993728
45993729
45993730
45993731
45993732
45993733
45993734
45993735

(as long as you trust .__array_interface__['data'][0] not do do strange things!)

How does this show that the original 8 integers were adiacent in memory? Maybe they're not, and `a.view` just collects them from wherever they are? — Kelly Bundy, Jul 23 '20 at 10:41
@HeapOverflow Mmmh... without looking at the code, any example is more a check than a formal proof... but I added an example that is maybe a bit more transparent. — Pietro Battiston, Jul 23 '20 at 10:57

score -1 · Answer 2 · answered Jul 23 '20 at 09:55

-1

As far as I understand, Python (or at least CPython) implements lists as dynamic arrays.

It might be worth checking out this article for a quick intro to the concept of dynamic arrays if you are unfamiliar.

However, the way that these dynamic arrays are implemented in Python is different from a "standard implementation" of the data structure - see below quote from the Wikipedia page on dynamic arrays.

... in languages like Python or Java that enforce reference semantics, the dynamic array generally will not store the actual data, but rather it will store references to the data that resides in other areas of memory. In this case, accessing items in the array sequentially will actually involve accessing multiple non-contiguous areas of memory, so the many advantages of the cache-friendliness of this data structure are lost. - Wikipedia

I have been unable to find any supporting articles or documentation around this at first glance, however I would think that strings are implemented in a similar way to how lists are within Python - with some major modifications but similar underlying logic in parts.

Strings being implemented in such a way would neatly explain why you are seeing sub-strings (Python does not have an explicit char class, a char variable is simply a string with length 1 as far as Python is concerned) within a string at non-contiguous locations within memory at runtime.

answered Jul 23 '20 at 09:55

JPI93

1,507
5
10

Thanks for your answer.Maybe you are right. But this question is still confusing.We know that the strings is Immutable class.This seems to go against the characteristics of dynamic arrays,becaues,we can't add any element into the fixed string. – wit Jul 23 '20 at 10:13
@wit No worries. Yeah, string implementation in Python is a total minefield. Python strings being immutable doesn't necessarily discount the possibility of a dynamic array like implementation under-the-hood however it does beg the question of why they would be immutable in the first place if this is how they are implemented - though a likely answer to this (in part) would be the threadsafe nature of immutable objects, increased reliability in case of using string instances as dictionary keys etc. – JPI93 Jul 23 '20 at 10:26
Python doesn't implement strings like that (at least not CPython, and I'd be surprised if other implementations did). – Kelly Bundy Jul 23 '20 at 10:43
@HeapOverflow can you point me to any resources on CPython string implementation? Would like to get this sorted out in my own head if this is the case. – JPI93 Jul 23 '20 at 11:03
I don't have a nice single source, just what I've seen here and there, including [`unicodeobject.c`](https://github.com/python/cpython/blob/master/Objects/unicodeobject.c). The actual character data is stored right there contiguously in memory (with 1, 2 or 4 bytes per character). [This](https://rushter.com/blog/python-strings-and-memory/) shows a bit about that. – Kelly Bundy Jul 23 '20 at 11:47
@HeapOverflow Thanks for sharing this stuff. Yes of course unicode character data is stored contiguously in memory - nobody's disputing that - the question is, why/how do adjacent characters within a string report non contiguous memory locations? Apologies if I'm simply misunderstanding something. – JPI93 Jul 24 '20 at 10:05
They *don't* report non-contiguous memory locations for the adjacent characters. The OP isn't asking for memory locations of the characters. They're asking for memory locations of completely separate string objects. – Kelly Bundy Jul 25 '20 at 12:14

How does python implement the indexing of strings?

2 Answers2