34

Today in class, we learned that retrieving an element from a list is O(1) in Python. Why is this the case? Suppose I have a list of four items, for example:

li = ["perry", 1, 23.5, "s"]

These items have different sizes in memory. And so it is not possible to take the memory location of li[0] and add three times the size of each element to get the memory location of li[3]. So how does the interpreter know where li[3] is without having to traverse the list in order to retrieve the element?

nabster
  • 1,561
  • 2
  • 20
  • 32
Teererai Marange
  • 2,044
  • 4
  • 32
  • 50
  • 15
    What makes you think that arrays are linearly allocated rather than a list of pointers. - [me confused by your profile description] – Morrison Chang Oct 07 '18 at 02:37
  • 29
    Don't confuse item **access**, which is ``O(1)``, with item **lookup / search**, which is ``O(n)``. – Mike Scotty Oct 07 '18 at 02:39
  • 6
    Some relevant reading material: [Python list implementation](http://www.laurentluce.com/posts/python-list-implementation/), [How is Python's List Implemented?](https://stackoverflow.com/questions/3917574/how-is-pythons-list-implemented), [What is the underlying data structure for Python lists?](https://stackoverflow.com/questions/914233/what-is-the-underlying-data-structure-for-python-lists). – Warren Weckesser Oct 07 '18 at 02:48
  • 2
    It is not good to ask [the same question](https://cs.stackexchange.com/questions/98240/how-come-retrieving-an-element-from-a-list-is-o1) on two different SE sites (when answers will be within the same context). – rus9384 Oct 07 '18 at 14:51
  • @MikeScotty : Unless you have lookup-table of some kind. – mathreadler Oct 07 '18 at 15:14
  • 2
    [Cross-posted to Computer Science](https://cs.stackexchange.com/q/98240/9550) (where, IMO, it's off-topic). Please don't post the same question to multiple Stack Exchange sites. It fragments answers and wastes people's time when they put effort into answering questions that already have answers elsewhere. – David Richerby Oct 07 '18 at 15:47
  • 1
    **Who** says they have different sizes in memory? The person who told you that believes something false. – Eric Lippert Oct 07 '18 at 15:37
  • 2
    Voting to close as off-topic, since this is language-specific and it's already been answered on Stack Overflow. – David Richerby Oct 07 '18 at 15:46
  • BTW, polymorphism without a layer of indirection would normally only be done for a serialization format (like that source text, for example). – Peter Cordes Oct 07 '18 at 19:29
  • The interpreter is not directly accessing your inhomogeneous objects. It holds the addresses of the objects. Access complexity for Python lists is actually O(2) which is also O(1). Pointer access + Access to the address pointed by the pointer. IMHO. – Ali Berat Çetin Jun 14 '22 at 11:18

3 Answers3

58

A list in Python is implemented as an array of pointers1. So, what's really happening when you create the list:

["perry", 1, 23.5, "s"]

is that you are actually creating an array of pointers like so:

[0xa3d25342, 0x635423fa, 0xff243546, 0x2545fade]

Each pointer "points" to the respective objects in memory, so that the string "perry" will be stored at address 0xa3d25342 and the number 1 will be stored at 0x635423fa, etc.

Since all pointers are the same size, the interpreter can in fact add 3 times the size of an element to the address of li[0] to get to the pointer stored at li[3].


1 Get more details from: the horse's mouth (CPython source code on GitHub).

DJG
  • 6,413
  • 4
  • 30
  • 51
  • I am sorry, but according to Mark Lutz book I've got the impression list stores references, which is a bit different than pointers, is it the same thing there and different elsewhere, or references is pointers in python? I've heard what in C it's different with a complex explanation of difference. – Dmitry Verhoturov Oct 07 '18 at 09:23
  • 5
    @DmitryVerhoturov That's right but makes no practical difference for this answer. References are reference-counted, https://docs.python.org/3/c-api/structures.html#c.PyVarObject – pipe Oct 07 '18 at 09:37
  • 8
    Reference is implemented as pointers in every language I know. Semantics may differ slightly (e.g. differences in memory management, or references in C++ being immutable), but in the end they are still pointers. – Frax Oct 07 '18 at 16:10
  • Note that in practice retrieving an element from the list is actually more like O(log n)... and in theory physical constraints limit it to no better than O(sqrt n). – TLW Oct 07 '18 at 19:31
  • 6
    @TLW I've never seen those before. Where'd you find them? – Cort Ammon Oct 08 '18 at 03:59
  • @CortAmmon: If nothing else, the bit length of a memory location is O(ln). However, that's not what TLW is driving at. I assume his point is that data needs to be stored on physical objects and that once this maximum data density is reached, one must make the object larger. If we assume memory is stored purely in 2D (which strikes me as unfair if dealing with theoretical, physical constraints), the amount/location of our memory is proportional to the 2-dimensional area of our chip. This is slightly meaningful in the real world; on-chip caches are smaller/faster than RAM. – Brian Oct 08 '18 at 12:58
  • 1
    @Brian Ahh, that makes sense. If I may expound for those who were as curious as I was, those numbers are useful for firmware designers who are doing combinatorial logic within the chips themselves. Big-oh analysis is always done with respect to some abstract machine, and when you're doing firmware, modeling time as 'gate depth' or 'wire distance' is reasonable. For anyone doing software (especially Python and other interpreted languages), it's more useful to do the big-Oh analysis based on an abstract machine where accessing memory takes some fixed number of cycles, hence the O(1) – Cort Ammon Oct 08 '18 at 14:50
  • @Brian - even assuming memory is stored in 3D there are several constraints that limit it to O(sqrt N). Among other things: 1. Bousso's bound (likely not relevant in practice), 2. Heat dissipation / power input (which presumably has a linear component per bit) can only take place on the surface of the volume. – TLW Oct 08 '18 at 18:46
  • @CortAmmon - About the only thing that the O(1) memory model has going for it is simplicity. Even in languages like Python (arguably _especially_, given the number of pointer-chases even basic operations tend to entail), memory access is nowhere near O(1) w.r.t. the size of the working set. – TLW Oct 08 '18 at 19:19
  • 3
    @TLW That simplicity is important for developers who will never operate in environments where they are concerned with performance as the working set approaches exabytes, but for which there is a substantial performance difference between an O(n) and O(n log n) algorithm with respect to their simplified computational model. The simplified model does a good job of focusing attention on the most important aspects of the algorithm. – Cort Ammon Oct 08 '18 at 19:25
17

When you say a = [...], a is effectively a pointer to a PyObject containing an array of pointers to PyObjects.

When you ask for a[2], the interpreter first follows the pointer to the list's PyObject, then adds 2 to the address of the array inside it, then returns that pointer. The same happens if you ask for a[0] or a[9999].

Basically, all Python objects are accessed by reference instead of by value, even integer literals like 2. There are just some tricks in the pointer system to keep this all efficient. And pointers have a known size, so they can be stored conveniently in C-style arrays.

Draconis
  • 3,209
  • 1
  • 19
  • 31
7

Short answer: Python lists are arrays.

Long answer: The computer science term list usually means either a singly-linked list (as used in functional programming) or a doubly-linked list (as used in procedural programming). These data structures support O(1) insertion at either the head of the list (functionally) or at any position that does not need to be searched for (procedurally). A Python ``list'' has none of these characteristics. Instead it supports (amortized) O(1) appending at the end of the list (like a C++ std::vector or Java ArrayList). Python lists are really resizable arrays in CS terms.

The following comment from the Python documentation explains some of the performance characteristics of Python ``lists'':

It is also possible to use a list as a queue, where the first element added is the first element retrieved (“first-in, first-out”); however, lists are not efficient for this purpose. While appends and pops from the end of list are fast, doing inserts or pops from the beginning of a list is slow (because all of the other elements have to be shifted by one).

hkBst
  • 2,818
  • 10
  • 29
  • 1
    I’ve never heard of singly-linked lists being specifically associated with functional programming, or doubly-linked lists being specifically associated with procedural programming. Both types of list are valid and have their use-cases for both programming paradigms (and other programming paradigms besides). Can you back up that claim? I find it quite dubious. – KRyan Oct 07 '18 at 21:36
  • @KRyan I'm pretty sure that Lisp, Haskell, Ocaml are all generally using singly-linked lists, especially with the more convenient primitives in the languages. Lisp in particular has a bunch of shorthand like car/cdr for getting the various parts of the list elements. Of course they're used everywhere else, but, Lisp and functional company often makes much heavier use of them. C++'s list, for example is a doubly linked list, and only recently did they get a forward_list(which is singly typed) – violet_white Oct 07 '18 at 21:48
  • 1
    This is a good answer, but I agree that the claim about list implementations in functional vs procedural languages seems to be too general. Whether an abstract list data type is implemented as an array or linked list isn't really a part of the language specification in high level languages, is it? I suppose it's possible to make a Lisp runtime where lists are implemented as arrays, like in cpython? – Håken Lid Oct 19 '18 at 12:59
  • @HåkenLid: performance characteristics are often part of the specification of a data type, especially for languages that are more serious about performance. For example [see this Q&A about C++](https://stackoverflow.com/questions/181693/what-are-the-complexity-guarantees-of-the-standard-containers). I am not aware of such an explicit list for Python, but you can get a hint from the interface exposed by the standard ''list'' type: there is `append` and `extend` but there is no `prepend`/`cons`. – hkBst Oct 25 '18 at 09:56
  • 1
    @HåkenLid: where the docs are silent the fallback position is that the CPython implementation is the de facto specification of Python, although apparently [other list implementations do get discussed](https://www.python.org/dev/peps/pep-3128/). – hkBst Oct 25 '18 at 10:04