Why is list multiplication so fast?

Question

When creating a list, I thought comprehension is recommended whenever possible as it is fastest. But lo and behold.

In [1]: %timeit -n1000 [0]*1000000
1000 loops, best of 3: 2.3 ms per loop

In [2]: %timeit -n1000 [0 for _ in range(1000000)]
1000 loops, best of 3: 27.1 ms per loop

In [3]: a = np.zeros(1000000, dtype=int)

In [4]: %timeit -n1000 a.tolist()
1000 loops, best of 3: 7.93 ms per loop

Even numpy.ndarray.tolist can't keep up with multiplication. Why is that?

#2 actually runs a python loop, while #1 is completely interpreted without any python loop. — spectras, Jun 08 '16 at 15:18

score 5 · Accepted Answer · edited Sep 03 '19 at 00:31

The dis module is useful for comparing the first two methods.

def list_mult():
    return [0]*1000000

dis.dis(list_mult)
#  2           0 LOAD_CONST               1 (0)
#              3 BUILD_LIST               1
#              6 LOAD_CONST               2 (1000000)
#              9 BINARY_MULTIPLY     
#             10 RETURN_VALUE

Here, the BINARY_MULTIPLY instruction is used. On the other hand...

def list_range():
    return [0 for _ in range(1000000)]

dis.dis(list_range)
# 2           0 BUILD_LIST               0
#             3 LOAD_GLOBAL              0 (range)
#             6 LOAD_CONST               1 (1000000)
#             9 CALL_FUNCTION            1
#            12 GET_ITER            
#       >>   13 FOR_ITER                12 (to 28)
#            16 STORE_FAST               0 (_)
#            19 LOAD_CONST               2 (0)
#            22 LIST_APPEND              2
#            25 JUMP_ABSOLUTE           13
#       >>   28 RETURN_VALUE

This function explicitly constructs a loop, and then loads 0 and appends it to the working list in each iteration. This is going to be a lot slower.

It should be noted that these two construction methods are not equivalent, particularly when the value inside the list is mutable. For example, [object()] * 10 will give you a list of 10 of the same object, while [object() for _ in range(10)] will give you a list of 10 distinct objects.

Regarding the numpy example, this operation is kind of the worst-case for numpy. There is a lot of overhead in constructing and converting numpy arrays so that the vectorized operations can be fast (as noted in the comments).

score 1 · Answer 2 · answered Jun 08 '16 at 15:20

In the first case, python can create a list with all the zeroes referring to the same id. This works great since primitives are literal, but it wouldn't work as expected if you're passing an object. In that case, every element would actually be a reference to the same object.

In the range case, there are function calls being made, so that creates more overhead.

Why is list multiplication so fast?

2 Answers2

Linked

Related