170

I created two lists l1 and l2, but each one with a different creation method:

import sys

l1 = [None] * 10
l2 = [None for _ in range(10)]

print('Size of l1 =', sys.getsizeof(l1))
print('Size of l2 =', sys.getsizeof(l2))

But the output surprised me:

Size of l1 = 144
Size of l2 = 192

The list created with a list comprehension is a bigger size in memory, but the two lists are identical in Python otherwise.

Why is that? Is this some CPython internal thing, or some other explanation?

Moberg
  • 5,253
  • 4
  • 38
  • 54
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • 2
    Probably, the repetition operator will invoke some function that exactly sizes the underlying array. Note, that `144 == sys.getsizeof([]) + 8*10)` where 8 is the size of a pointer. – juanpa.arrivillaga Jul 25 '18 at 19:26
  • 1
    Note that if you change `10` to `11`, the `[None] * 11` list has size `152`, but the list comprehension still has size `192`. The previously linked question isn't an exact duplicate, but it is relevant in understanding why this happens. – Patrick Haugh Jul 25 '18 at 19:30

3 Answers3

182

When you write [None] * 10, Python knows that it will need a list of exactly 10 objects, so it allocates exactly that.

When you use a list comprehension, Python doesn't know how much it will need. So it gradually grows the list as elements are added. For each reallocation it allocates more room than is immediately needed, so that it doesn't have to reallocate for each element. The resulting list is likely to be somewhat bigger than needed.

You can see this behavior when comparing lists created with similar sizes:

>>> sys.getsizeof([None]*15)
184
>>> sys.getsizeof([None]*16)
192
>>> sys.getsizeof([None for _ in range(15)])
192
>>> sys.getsizeof([None for _ in range(16)])
192
>>> sys.getsizeof([None for _ in range(17)])
264

You can see that the first method allocates just what is needed, while the second one grows periodically. In this example, it allocates enough for 16 elements, and had to reallocate when reaching the 17th.

interjay
  • 107,303
  • 21
  • 270
  • 254
57

As noted in this question the list-comprehension uses list.append under the hood, so it will call the list-resize method, which overallocates.

To demonstrate this to yourself, you can actually use the dis dissasembler:

>>> code = compile('[x for x in iterable]', '', 'eval')
>>> import dis
>>> dis.dis(code)
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x10560b810, file "", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (iterable)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x10560b810, file "", line 1>:
  1           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 8 (to 14)
              6 STORE_FAST               1 (x)
              8 LOAD_FAST                1 (x)
             10 LIST_APPEND              2
             12 JUMP_ABSOLUTE            4
        >>   14 RETURN_VALUE
>>>

Notice the LIST_APPEND opcode in the disassembly of the <listcomp> code object. From the docs:

LIST_APPEND(i)

Calls list.append(TOS[-i], TOS). Used to implement list comprehensions.

Now, for the list-repetition operation, we have a hint about what is going on if we consider:

>>> import sys
>>> sys.getsizeof([])
64
>>> 8*10
80
>>> 64 + 80
144
>>> sys.getsizeof([None]*10)
144

So, it seems to be able to exactly allocate the size. Looking at the source code, we see this is exactly what happens:

static PyObject *
list_repeat(PyListObject *a, Py_ssize_t n)
{
    Py_ssize_t i, j;
    Py_ssize_t size;
    PyListObject *np;
    PyObject **p, **items;
    PyObject *elem;
    if (n < 0)
        n = 0;
    if (n > 0 && Py_SIZE(a) > PY_SSIZE_T_MAX / n)
        return PyErr_NoMemory();
    size = Py_SIZE(a) * n;
    if (size == 0)
        return PyList_New(0);
    np = (PyListObject *) PyList_New(size);

Namely, here: size = Py_SIZE(a) * n;. The rest of the functions simply fills the array.

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • "As noted in this question the list-comprehension uses list.append under the hood" I think that it's more accurate to say that it uses `.extend()`. – Acccumulation Jul 26 '18 at 15:42
  • @Acccumulation why do you believe so? – juanpa.arrivillaga Jul 26 '18 at 15:48
  • Because it's not appending elements one-by-one. When you append elements to a list, you're really creating a new list, with a new memory allocation, and putting the list into that new memory allocation. List comprehensions, on the other hand, put most of the new elements into memory that has already been allocated, and when they run out of allocated memory, they allocate another chuck of memory, not just enough for the new element. – Acccumulation Jul 26 '18 at 15:57
  • 8
    @Acccumulation That is incorrect. `list.append` is an amortized constant time operation because when a list resizes, it overallocates. Not every append operation, therefore, results in a newly allocated array. In any event the question that I linked to shows you in the source code that in fact, list comprehensions *do* use `list.append`,. I'll be back at my laptop in a moment and I can show you the disassembled bytecode for a list comprehension and the corresponding `LIST_APPEND` opcode – juanpa.arrivillaga Jul 26 '18 at 16:04
2

None is a block of memory, but it is not a pre-specified size. In addition to that, there is some extra spacing in an array between array elements. You can see this yourself by running:

for ele in l2:
    print(sys.getsizeof(ele))

>>>>16
16
16
16
16
16
16
16
16
16

Which does not total the size of l2, but rather is less.

print(sys.getsizeof([None]))
72

And this is much greater than one tenth of the size of l1.

Your numbers should vary depending on both the details of your operating system and the details of current memory usage in your operating system. The size of [None] can never be bigger than the available adjacent memory where the variable is set to be stored, and the variable may have to be moved if it is later dynamically allocated to be larger.

scharette
  • 9,437
  • 8
  • 33
  • 67
StevenJD
  • 77
  • 6
  • 2
    `None` isn't actually stored in the underlying array, the only thing that is stored is a `PyObject` pointer (8 bytes). All Python objects are allocated on the heap. `None` is a singleton, so having a list with many nones is simply will create an array of PyObject pointers to the same `None` object on the heap (and not use additional memory in the process per additional `None`). I'm not sure what you meanby "None doesn't have a pre-specified size", but that doesn't sound correct. Finally, your loop with `getsizeof` each element isn't demonstrating what you appear to think it is demonstrating. – juanpa.arrivillaga Jul 31 '18 at 19:05
  • If as you say is true, the size of [None]*10 should be the same as the size of [None]. But clearly this is not so-- some extra storage has been added. In fact, the size of [None] repeated ten times (160) is also less than the size of [None] multiplied by ten. As you point out, clearly the size of the pointer to [None] is smaller than the size of [None] itself (16 bytes rather than 72 bytes). However, 160+32 is 192. I don't think the preceding answer solves the problem entirely either. It's clear that some extra small amount of memory (perhaps machine state dependent) is allocated. – StevenJD Aug 01 '18 at 13:12
  • "If as you say is true, the size of [None]*10 should be the same as the size of [None]" what am I saying that could possibly imply that? Again, you seem to be concentrating on the fact that the underlying buffer is over-allocated, or that the size of the list includes more than the size of the underlying buffer (it of course does), but that isn't the point of this question. Again, your use of `gestsizeof` on each `ele` of `l2` is misleading because `getsizeof(l2)` *does not take into account the size of the elements inside the container*. – juanpa.arrivillaga Aug 01 '18 at 16:00
  • To prove to yourself that last claim, do `l1 = [None]; l2 = [None]*100; l3 = [l2]` then `print(sys.getsizeof(l1), sys.getsizeof(l2), sys.getsizeof(l3))`. you'll get a result like: `72 864 72`. That is, respectively, `64 + 1*8`, `64 + 100*8`, and `64 + 1*8`, again, assuming a 64bit system with 8 byte pointer size. – juanpa.arrivillaga Aug 01 '18 at 16:02
  • Then how do you explain the size=16 of the elements of l2 when [None] is not identically replicated? – StevenJD Aug 02 '18 at 16:23
  • 2
    As I've stated, `sys.getsizeof` *does not account for the size of the items in the container. From the [docs](https://docs.python.org/3/library/sys.html#sys.getsizeof): "Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to...See [recursive sizeof](https://code.activestate.com/recipes/577504) recipe for an example of using getsizeof() recursively to find the size of containers and all their contents." – juanpa.arrivillaga Aug 02 '18 at 20:11
  • You'll note, using that recursive sizeof recipe, will give you: `total_size([None]*100) == 880`, which is precisely `16` bytes more than `sys.getsizeof([None]*100)`, i.e. `864`, because `None` is a singleton object (in any case, the repetition operator never copies objects, and simply copies references) – juanpa.arrivillaga Aug 02 '18 at 20:12