Why is updating a list faster when using a list comprehension as opposed to a generator expression?

Question

According to this answer lists perform better than generators in a number of cases, for example when used together with str.join (since the algorithm needs to pass over the data twice).

In the following example using a list comprehension seems to yield better performance than using a corresponding generator expression though intuitively the list comprehension comes with an overhead of allocating and copying to additional memory which the generator sidesteps.

In [1]: l = list(range(2_000_000))

In [2]: %timeit l[:] = [i*3 for i in range(len(l))]
190 ms ± 4.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: %timeit l[:] = (i*3 for i in range(len(l)))
261 ms ± 7.14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit l[::2] = [i*3 for i in range(len(l)//2)]
97.1 ms ± 2.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [5]: %timeit l[::2] = (i*3 for i in range(len(l)//2))
129 ms ± 2.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [6]: %timeit l[:len(l)//2] = [i*3 for i in range(len(l)//2)]
92.6 ms ± 2.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [7]: %timeit l[:len(l)//2] = (i*3 for i in range(len(l)//2))
118 ms ± 2.17 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Why does a list comprehension yield better performance in these cases?

Could be that `l[:]` is a slice, so to make the types match up, the generator has to be converted to a list behind the scenes — C.Nivs, Apr 24 '19 at 14:49
@C.Nivs `l[:] = ...` is equivalent to `l.__setitem__(slice(None), ...)` but why does the generator need to be converted to a list? — a_guest, Apr 24 '19 at 14:55
From the [Python language reference](https://docs.python.org/3/reference/simple_stmts.html#assignment-statements): `If the target is a slicing: The primary expression in the reference is evaluated. It should yield a mutable sequence object (such as a list). The assigned object should be a sequence object of the same type.` Thus, a generator must be coerced to `list` type — C.Nivs, Apr 24 '19 at 15:28
I will add, as an aside, that iterating over generators is slow. Try timing `for x in [i for i in range(10_000)]: pass` and `for x in (i for i in range(10_000)): pass` And you will see that even if you have to do two passes with the list comprehension version, iteration is still over-all faster with the list comprehension. I don't start seeing the generator expression winning until we are working with about 1_000_000 items, and even then it's only marginally faster... — juanpa.arrivillaga, Apr 24 '19 at 18:48
@juanpa.arrivillaga Okay, but while I've used a generator expression for the sake of the example, imagine I get the generator from somewhere else. At first glance it seems wasteful that the generator is first exhausted, then copied into the original list - as opposed to overwriting the items in the list right away (for non-extended slice assignment). I understand that because the size of the original list might change during that operation it is advantageous to know the new size right from the start (though I could imagine an algorithm that does the resizing dynamically - if at all necessary). — a_guest, Apr 25 '19 at 12:51

wim · Accepted Answer · 2019-07-25T04:47:53.053

4

This answer concerns CPython implementation only. Using a list comprehension is faster, since the generator is first converted into a list anyway. This is done because the length of the sequence should be determined before proceeding to replace data, and a generator can't tell you its length.

For list slice assignment, this operation is handled by the amusingly named list_ass_slice. There is a special-case handling for assigning a list or tuple, here - they can use PySequence_Fast ops.

This is the v3.7.4 implementation of PySequence_Fast, where you can clearly see a type-check for list or tuples:

PyObject *
PySequence_Fast(PyObject *v, const char *m)
{
    PyObject *it;

    if (v == NULL) {
        return null_error();
    }

    if (PyList_CheckExact(v) || PyTuple_CheckExact(v)) {
        Py_INCREF(v);
        return v;
    }

    it = PyObject_GetIter(v);
    if (it == NULL) {
        if (PyErr_ExceptionMatches(PyExc_TypeError))
            PyErr_SetString(PyExc_TypeError, m);
        return NULL;
    }

    v = PySequence_List(it);
    Py_DECREF(it);

    return v;
}

A generator expression will fail this type check and continue to the fallback code, where it is converted into a list object, so that the length can be predetermined.

In the general case, a predetermined length is desirable in order to allow efficient allocation of list storage, and also to provide useful error messages with extended slice assignment:

>>> vals = (x for x in 'abc')
>>> L = [1,2,3]
>>> L[::2] = vals  # attempt assigning 3 values into 2 positions
---------------------------------------------------------------------------
                                          Traceback (most recent call last)
...
ValueError: attempt to assign sequence of size 3 to extended slice of size 2
>>> L  # data unchanged
[1, 2, 3]
>>> list(vals)  # generator was fully consumed
[]

edited Jul 25 '19 at 04:47

answered Apr 24 '19 at 15:48

wim

338,267
99
616
750

2

Thanks for shedding some light on this topic. I suspected the conversion but it was not completely clear why this is necessary (apart from extended slice assignment). Looking at the C code it seems that the reason is performance in case ["`d` items are inserted"](https://github.com/python/cpython/blob/3.7/Objects/listobject.c#L655) (since "delete -d items" could be handled without knowing the new size in advance). I imagined a solution similar to `list_extend` but this could result in unnecessarily copying data. By the way is `l[::2]` handled by the same function (cause there is no step size)? – a_guest Apr 24 '19 at 18:26
The extended slice assignment will go into [`list_ass_subscript`](https://github.com/python/cpython/blob/3.7/Objects/listobject.c#L2778). Then the same argument about usage of `PySequence_Fast` eventually applies again, [here](https://github.com/python/cpython/blob/3.7/Objects/listobject.c#L2880). – wim Apr 24 '19 at 18:33
Okay thanks. I gave the C code another look and it is not completely clear why the assigned object's size has to be known up front (apart from extended slice assignment). Why can't the algorithm use a size hint similar to `list_extend` and only expand the iterator in case the size hint exceeds the slice's length? Otherwise the memory corresponding to the slice could be just overwritten and if it turns out there are too many items, the iterator could be still expanded and the list resized for the remaining items, similarly as it is done now for the whole thing. Do you know the reason for this? – a_guest Apr 25 '19 at 21:26
It's the right hand side of the assignment that would need to provide a hint (via `__length_hint__` method). But the generator can't give you any sensible size hint. It could literally be some data coming in from a socket (typical use-case for a generator), or from a random number generator. In practice, if you know the length of the data, you often don't have a generator in the first place. Undesirable to over-complicate the typical use-case to account for pathological edge-case, I guess? – wim Apr 25 '19 at 21:52
I was just wondering why `list_extend` works with length hints and dynamic resizing (instead of expanding the iterator up front and working with the actual size) but `list_ass_slice` doesn't (though it could). The generator expression was just an example for the question, but this actually concerns any iterator, e.g. `map`, `filter` or any custom iterator. But yeah, maybe it's a niche case and for large amounts of data, for which a difference in performance gets noticeable, people probably use numpy anyway. – a_guest Apr 25 '19 at 23:40

Why is updating a list faster when using a list comprehension as opposed to a generator expression?

1 Answers1

Linked