Why is [None] * 10 faster than [None for i in range(10)]

Question

I wanted to create a list with some initialized values, because an empty list is not an option in python. So I started thinking which would be faster: l = [None for i in range(1000)] or l = [None] * 1000 And I tried testing it with with timeit:

In [56]: timeit.timeit('l = [None] * 1000', number=10000)
Out[56]: 0.04936316597741097
In [58]: timeit.timeit('l = [None for i in range(1000)]', number=10000)
Out[58]: 0.2318978540133685

I was surprised that [None] * 1000 is faster.

Why is that (and is my method for performance testing correct) ?
Is there a faster way to initialize an "empty" list?

without getting into the (CPython) implementation (you can disassemble it), to me it is intuitive that a simple multiplication would be faster than creating a range object and looping over it with a comprehension — Chris_Rands, Oct 20 '19 at 20:30
the thing you should think about practically is if you're using a mutable `x` value, `[x]*1000` is different from `[x for _ in range(1000)]` — Chris_Rands, Oct 20 '19 at 20:34
Nvm my previous comment. Even with better testing parameters, the first is still ~14x faster, which is an even more severe difference than you tests show. — Carcigenicate, Oct 20 '19 at 20:52
Anyway, a list comprehension is essentially an interpreter level for-loop (with some optimizations). It also used `.append` so every so often, the entire underlying array is resized. If you look at the source code for list repetition using the `*` operator, it allocates the entire underlying array up front, never having to resize, and the looping is done at the C level... — juanpa.arrivillaga, Oct 20 '19 at 21:05
Here is [an answer I wrote](https://stackoverflow.com/questions/51526242/why-do-two-identical-lists-have-a-different-memory-footprint/51526419#51526419) to a related question that might be illuminating — juanpa.arrivillaga, Oct 20 '19 at 21:08
It is completely different code which produces the same result. Why wouldn't they have different speed? Maybe a better question is why do you care which one is faster? They are both fast enough and they are both something you probably don't need in your code ;) — zvone, Oct 20 '19 at 21:56
@zvone: Algorithms where a list is filled in other than in front-to-back order do often need a list of dummy values. — Davis Herring, Oct 20 '19 at 22:12
@juanpa.arrivillaga Thank you, I think that answers my question. I knew there would be some kind of interpreter optimization for this because at first glance it would look like N arrays of 1 element will be created and then concatenated. If you put that as an answer I will mark as the correct one. — evgeni tsvetanov, Oct 20 '19 at 22:31
@juanpa.arrivillaga thank you, I read the answer, interesting stuff — evgeni tsvetanov, Oct 20 '19 at 22:35
@Chris_Rands I meant an "empty list of size N". I know there isn't such a thing in Python so I put the quotation marks. I wrote it in a confusing way. — evgeni tsvetanov, Oct 20 '19 at 22:36

score 3 · Accepted Answer · answered Mar 04 '20 at 22:13

I assume you are using CPython. Let's compare the generated Python bytcode with the dis module. This is the first version:

>>> import dis
>>> def f():
...     return [None] * 1000
>>> dis.dis(f)
  2           0 LOAD_CONST               0 (None)
              2 BUILD_LIST               1
              4 LOAD_CONST               1 (1000)
              6 BINARY_MULTIPLY
              8 RETURN_VALUE

This is pretty clear: a list [None] is built (lines 0-2), and multiplied by 1000 (lines 4-6).

This is the second version:

>>> def g():
...     return [None for _ in range(1000)]
>>> dis.dis(g)
  2           0 LOAD_CONST               1 (<code object <listcomp> at ..., file "<doctest __main__[3]>", line 2>)
              2 LOAD_CONST               2 ('g.<locals>.<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_GLOBAL              0 (range)
              8 LOAD_CONST               3 (1000)
             10 CALL_FUNCTION            1
             12 GET_ITER
             14 CALL_FUNCTION            1
             16 RETURN_VALUE

This is more complex: a function named g.<locals>.<listcomp> (line 2) and with a code we'll see below (line 0) is created (line 4). A range(1000) is built (lines 6-8-10) and an iterator created (line 12). This iterator is passed to the g.<locals>.<listcomp> function (line 14) and the result returned (line 16).

Let's look at the g.<locals>.<listcomp> function:

>>> dis.dis(g.__code__.co_consts[1])
  2           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 8 (to 14)
              6 STORE_FAST               1 (_)
              8 LOAD_CONST               0 (None)
             10 LIST_APPEND              2
             12 JUMP_ABSOLUTE            4
        >>   14 RETURN_VALUE

An empty list is created (line 0), the iterator parameter (iter(range(1000))) is pushed on the stack (line 2), and a for loop begins (line 4). The value of the loop index (_) is stored inb the local array (line 6) and None is appended to the list (lines 8-10), until the loop ends (line 12 that loops to line 4).

To summarize:

first version: a multiplication;
second version: a local function is created, a range is created and the iterator passed to the function; this function iterates over the iterator and appends elements one by one.

The second version is indeed slower.

Note Beware the usual pitfall

>>> A = [[0]] * 3
>>> A
[[0], [0], [0]]
>>> A[0].append(1)
>>> A
[[0, 1], [0, 1], [0, 1]]

But:

>>> A = [[0] for _ in range(3)]
>>> A
[[0], [0], [0]]
>>> A[0].append(1)
>>> A
[[0, 1], [0], [0]]

If you wonder why, look at the bytecodes above.

Why is [None] * 10 faster than [None for i in range(10)]

1 Answers1