5

I am new to Python. I was recently confused by a syntax "[list] * k". I want to understand how Python actually executes it. Example:

>>> l = [1, 2] * 10
>>> l
[1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2]

Assume len(list) = n, when Python interprets it, I have following guesses with my limited knowledge.

  1. it uses list.extend(list) method. Thus it will take up O(n * k) space and use O(n * k) time.

  2. it only copies the reference of the original list and make k copy of it. Thus it will take up O(k) space and use O(k) time.

If my 2nd guess is the case, why and how following statement works?

>>> l[3] = 100
>>> l
[1, 2, 1, 100, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2]

Any official design doc, source code and PEP reference is strongly welcomed!

Follow up,

The source code link provided by @JoshLee and @MSeifert is very helpful in understanding the internal mechanism. See here at list_repeat

The following code snippet confirms the space complexity is O(n * k).

size = Py_SIZE(a) * n;

Also following code snippet confirms the time complexity is O(n * k).

p = np->ob_item;
items = a->ob_item;
for (i = 0; i < n; i++) {
    for (j = 0; j < Py_SIZE(a); j++) {
        *p = items[j];
        Py_INCREF(*p);
        p++;
    }
}
airwindow
  • 53
  • 4
  • https://wiki.python.org/moin/TimeComplexity says it's O(nk) – vaultah Jan 06 '18 at 16:35
  • Your first guess is right. – Ry- Jan 06 '18 at 16:36
  • If your 2nd guess were right then that `l[3]=100` couldn't possibly work, unless the references were lazily converted to copies. (Bear in mind that a Python list doesn't store the actual items, it stores an array of references to the items). Note that although the time complexity is O(nk) the copying is happening at C speed, so it's _much_ faster than copying the items one at a time with simple Python `for` loops. – PM 2Ring Jan 06 '18 at 17:05
  • (cont) This speed difference is so great that for small lists (eg under a couple of hundred items), you can effectively treat simple copy and extend operations as atomic O(1) operations when compared to performing such operations one by one with explicit Python loops. See [here](https://stackoverflow.com/a/47845960/4014959) for more on this topic, along with timing data. – PM 2Ring Jan 06 '18 at 17:05
  • Here's the cpython code for [list_repeat](https://github.com/python/cpython/blob/master/Objects/listobject.c#L505). – Josh Lee Jan 06 '18 at 17:27
  • @vaultah Thx for the information very much. I have actually checked this table before posting, but did not realize "Multiply" means "[list] * k" operation. – airwindow Jan 06 '18 at 22:22
  • @Ryan Thx for your confirmation! – airwindow Jan 06 '18 at 22:22
  • @JoshLee Thx a lot! The information is very helpful! – airwindow Jan 06 '18 at 22:24
  • @PM2Ring Thx for your insights very much! – airwindow Jan 06 '18 at 23:04

1 Answers1

7

A list is a shallow wrapper around an array of pointers to Python objects. So a Python list containing 1, 2, 3 would look like this:

l = [1, 2, 3]

enter image description here

If you multiply it by 5 the indices would still reference the same item, for example index 0, 3, 6, 9, 12 would store the same memory address (which explains why changing one item doesn't changed them all):

l2 = l * 5

enter image description here

However when the objects pointed to by the list were mutable, a change (as opposed to a replacement like in your example) to one of the items would be reflected in all of them:

>>> ls = [{1}, {2}, {3}]
>>> ls2 = ls*3
>>> ls[0].add(2)
>>> ls2
[{1, 2}, {2}, {3}, {1, 2}, {2}, {3}, {1, 2}, {2}, {3}]

So while this has a space and time complexity of O(n*k) it isn't as bad as it were if you would create a list containing pointers to distinct objects:

[int(i % 3 + 1) for i in range(15)]

enter image description here

Note that CPython actually reuses small integers - so the last images is inaccurate when dealing with integers like 1, 2, 3, .... It would be like the second image - but for other classes they (with a few more exceptions) would be distinct objects. However this affects only the factors, so would disappear in the O notation anyway, but it's nice to know.

The list-multiplication code is written in C (like a lot of the built-in functions and types) and you can see it here (CPython 3.6.4):

static PyObject *
list_repeat(PyListObject *a, Py_ssize_t n)
{
    Py_ssize_t i, j;
    Py_ssize_t size;
    PyListObject *np;
    PyObject **p, **items;
    PyObject *elem;
    if (n < 0)
        n = 0;
    if (n > 0 && Py_SIZE(a) > PY_SSIZE_T_MAX / n)
        return PyErr_NoMemory();

    /* Create an empty list: Space complexity: k * n */
    size = Py_SIZE(a) * n;
    if (size == 0)
        return PyList_New(0);
    np = (PyListObject *) PyList_New(size);
    if (np == NULL)
        return NULL;

    items = np->ob_item;

    /* Special case: The multiplied list contains one item. Time complexity: k */
    if (Py_SIZE(a) == 1) {
        elem = a->ob_item[0];
        for (i = 0; i < n; i++) {
            items[i] = elem;
            Py_INCREF(elem);
        }
        return (PyObject *) np;
    }

    /* General case: The multiplied list contains more than one item: Time complexity: n * k */
    p = np->ob_item;
    items = a->ob_item;
    for (i = 0; i < n; i++) {
        for (j = 0; j < Py_SIZE(a); j++) {
            *p = items[j];
            Py_INCREF(*p);
            p++;
        }
    }
    return (PyObject *) np;
}

The comments were added by me, just to explain the (undocumented) source code.

MSeifert
  • 145,886
  • 38
  • 333
  • 352