148

Apparently list(a) doesn't overallocate, [x for x in a] overallocates at some points, and [*a] overallocates all the time?

Sizes up to n=100

Here are sizes n from 0 to 12 and the resulting sizes in bytes for the three methods:

0 56 56 56
1 64 88 88
2 72 88 96
3 80 88 104
4 88 88 112
5 96 120 120
6 104 120 128
7 112 120 136
8 120 120 152
9 128 184 184
10 136 184 192
11 144 184 200
12 152 184 208

Computed like this, reproducable at repl.it, using Python 3.8:

from sys import getsizeof

for n in range(13):
    a = [None] * n
    print(n, getsizeof(list(a)),
             getsizeof([x for x in a]),
             getsizeof([*a]))

So: How does this work? How does [*a] overallocate? Actually, what mechanism does it use to create the result list from the given input? Does it use an iterator over a and use something like list.append? Where is the source code?

(Colab with data and code that produced the images.)

Zooming in to smaller n:

Sizes up to n=40

Zooming out to larger n:

Sizes up to n=1000

Stefan Pochmann
  • 27,593
  • 8
  • 44
  • 107
  • 1
    Fwiw, extending your test cases it would seem that the list comprehension behaves as writing a loop and appending each item to the list, while `[*a]` appears to behave as using `extend` on an empty list. – jdehesa Mar 05 '20 at 16:27
  • 4
    It might help to look at the byte code generated for each. `list(a)` operates entirely in C; it can allocate the internal buffer node by node as it iterates over `a`. `[x for x in a]` just uses `LIST_APPEND` a lot, so it follows the normal "overallocate a little, reallocate when necessary" pattern of a normal list. `[*a]` uses `BUILD_LIST_UNPACK`, which... I don't know what that does, other than apparently over-allocate all the time :) – chepner Mar 05 '20 at 16:31
  • 2
    Also, in Python 3.7, it appears that `list(a)` and `[*a]` are identical, and *both* overallocate compared to `[x for x in a]`, so... `sys.getsizeof` might not be the right tool to use here. – chepner Mar 05 '20 at 16:43
  • 7
    @chepner I think `sys.getsizeof` is the right tool, it just shows that `list(a)` used to overallocate. Actually [What’s New In Python 3.8](https://docs.python.org/3/whatsnew/3.8.html) mentions it: *"The list constructor does not overallocate [...]"*. – Stefan Pochmann Mar 05 '20 at 16:52
  • 6
    @chepner: That [was a bug fixed in 3.8](https://bugs.python.org/issue33234); the constructor isn't supposed to overallocate. – ShadowRanger Mar 05 '20 at 17:51

3 Answers3

86

[*a] is internally doing the C equivalent of:

  1. Make a new, empty list
  2. Call newlist.extend(a)
  3. Returns list.

So if you expand your test to:

from sys import getsizeof

for n in range(13):
    a = [None] * n
    l = []
    l.extend(a)
    print(n, getsizeof(list(a)),
             getsizeof([x for x in a]),
             getsizeof([*a]),
             getsizeof(l))

Try it online!

you'll see the results for getsizeof([*a]) and l = []; l.extend(a); getsizeof(l) are the same.

This is usually the right thing to do; when extending you're usually expecting to add more later, and similarly for generalized unpacking, it's assumed that multiple things will be added one after the other. [*a] is not the normal case; Python assumes there are multiple items or iterables being added to the list ([*a, b, c, *d]), so overallocation saves work in the common case.

By contrast, a list constructed from a single, presized iterable (with list()) may not grow or shrink during use, and overallocating is premature until proven otherwise; Python recently fixed a bug that made the constructor overallocate even for inputs with known size.

As for list comprehensions, they're effectively equivalent to repeated appends, so you're seeing the final result of the normal overallocation growth pattern when adding an element at a time.

To be clear, none of this is a language guarantee. It's just how CPython implements it. The Python language spec is generally unconcerned with specific growth patterns in list (aside from guaranteeing amortized O(1) appends and pops from the end). As noted in the comments, the specific implementation changes again in 3.9; while it won't affect [*a], it could affect other cases where what used to be "build a temporary tuple of individual items and then extend with the tuple" now becomes multiple applications of LIST_APPEND, which can change when the overallocation occurs and what numbers go into the calculation.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • 4
    @StefanPochmann: I'd read the code before (which is why I knew this already). [This is the byte code handler for `BUILD_LIST_UNPACK`](https://github.com/python/cpython/blob/3.8/Python/ceval.c#L2703), it uses `_PyList_Extend` as the C equivalent of calling `extend` (just directly, rather than by method lookup). They combined it with the paths for building a `tuple` with unpacking; `tuple`s don't overallocate nicely for piecemeal building, so they always unpack to a `list` (to benefit from overallocation), and convert to `tuple` at the end when that's what was requested. – ShadowRanger Mar 05 '20 at 17:56
  • 4
    Note that this [apparently changes in 3.9](https://bugs.python.org/issue39320), where the construction is done with separate bytecodes (`BUILD_LIST`, `LIST_EXTEND` for each thing to unpack, `LIST_APPEND` for single items), instead of loading everything on the stack before building the whole `list` with a single byte code instruction (it allows the compiler to perform optimizations that the all-in-one instruction didn't allow, like implementing `[*a, b, *c]` as `LIST_EXTEND`, `LIST_APPEND`, `LIST_EXTEND` w/o needing to wrap `b` in a one-`tuple` to meet the requirements of `BUILD_LIST_UNPACK`). – ShadowRanger Mar 05 '20 at 18:03
18

Full picture of what happens, building on the other answers and comments (especially ShadowRanger's answer, which also explains why it's done like that).

Disassembling shows that BUILD_LIST_UNPACK gets used:

>>> import dis
>>> dis.dis('[*a]')
  1           0 LOAD_NAME                0 (a)
              2 BUILD_LIST_UNPACK        1
              4 RETURN_VALUE

That's handled in ceval.c, which builds an empty list and extends it (with a):

        case TARGET(BUILD_LIST_UNPACK): {
            ...
            PyObject *sum = PyList_New(0);
              ...
                none_val = _PyList_Extend((PyListObject *)sum, PEEK(i));

_PyList_Extend uses list_extend:

_PyList_Extend(PyListObject *self, PyObject *iterable)
{
    return list_extend(self, iterable);
}

Which calls list_resize with the sum of the sizes:

list_extend(PyListObject *self, PyObject *iterable)
    ...
        n = PySequence_Fast_GET_SIZE(iterable);
        ...
        m = Py_SIZE(self);
        ...
        if (list_resize(self, m + n) < 0) {

And that overallocates as follows:

list_resize(PyListObject *self, Py_ssize_t newsize)
{
  ...
    new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

Let's check that. Compute the expected number of spots with the formula above, and compute the expected byte size by multiplying it with 8 (as I'm using 64-bit Python here) and adding an empty list's byte size (i.e., a list object's constant overhead):

from sys import getsizeof
for n in range(13):
    a = [None] * n
    expected_spots = n + (n >> 3) + (3 if n < 9 else 6)
    expected_bytesize = getsizeof([]) + expected_spots * 8
    real_bytesize = getsizeof([*a])
    print(n,
          expected_bytesize,
          real_bytesize,
          real_bytesize == expected_bytesize)

Output:

0 80 56 False
1 88 88 True
2 96 96 True
3 104 104 True
4 112 112 True
5 120 120 True
6 128 128 True
7 136 136 True
8 152 152 True
9 184 184 True
10 192 192 True
11 200 200 True
12 208 208 True

Matches except for n = 0, which list_extend actually shortcuts, so actually that matches, too:

        if (n == 0) {
            ...
            Py_RETURN_NONE;
        }
        ...
        if (list_resize(self, m + n) < 0) {
Stefan Pochmann
  • 27,593
  • 8
  • 44
  • 107
8

These are going to be implementation details of the CPython interpreter, and so may not be consistent across other interpreters.

That said, you can see where the comprehension and list(a) behaviors come in here:

https://github.com/python/cpython/blob/master/Objects/listobject.c#L36

Specifically for the comprehension:

 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
...

new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

Just below those lines, there is list_preallocate_exact which is used when calling list(a).

Randy
  • 14,349
  • 2
  • 36
  • 42
  • 1
    `[*a]` isn't appending individual elements one at a time. It's got its own dedicated bytecode, that does bulk insertion via `extend`. – ShadowRanger Mar 05 '20 at 17:59
  • Gotcha - I guess I didn't dig far enough on that. Removed the section on `[*a]` – Randy Mar 05 '20 at 18:00