What is the reason for slowness of list(numpy.array)?

Question

It is well known, that if a is a numpy array, then a.tolist() is faster than list(a), for example:

>>> import numpy as np
>>> big_np=np.random.randint(1,10**7,(10**7,))

>>> %timeit list(big_np)
1 loop, best of 3: 869 ms per loop
>>> %timeit big_np.tolist()
1 loop, best of 3: 306 ms per loop

That means, the naive list(a) version is about factor 3 slower than the special-function tolist().

However, comparing it to the the performance of the build-in array-module:

>>> import array
>>> big_arr=array.array('i', big_np)
>>> %timeit list(big_arr)
1 loop, best of 3: 312 ms per loop

we can see, that one should probably say, that list(a) is slow rather than tolist() is fast, because array.array is as fast as the special function.

Another observation: array.array-module and tolist benefit from the small-integer-pool (i.e. when values are in range [-5, 256]), but this is not the case for list(a):

##only small integers:
>>> small_np=np.random.randint(1,250, (10**7,))
>>> small_arr=array.array('i', small_np)

>>> %timeit list(small_np)
1 loop, best of 3: 873 ms per loop
>>> %timeit small_np.tolist()
10 loops, best of 3: 188 ms per loop
>>> %timeit list(small_arr)
10 loops, best of 3: 150 ms per loop

As we can see the faster versions are about 2 times faster, but the slow version is as slow as before.

My question: what slows list(numpy.array) down compared to list(array.array)?

Edit:

One more observation, for Python2.7, it takes longer if the integers are bigger (i.e. cannot be hold by int32):

>>> very_big=np.random.randint(1,10**7,(10**7,))+10**17
>>> not_so_big=np.random.randint(1,10**7,(10**7,))+10**9
>>> %timeit very_big.tolist()
1 loop, best of 3: 627 ms per loop
>>> %timeit not_so_big.tolist()
1 loop, best of 3: 302 ms per loop

but still faster, than the slow list-version.

Ther work differently tolist() converts all the values to python primiteves recursively list() creates a python list from an iterable more on it: https://stackoverflow.com/questions/27890735/difference-between-listnumpy-array-and-numpy-array-tolist — Olaf Górski, Apr 03 '18 at 12:23
@OlafGórski Doesn't `array.array` use an iterator and still as fast? — ead, Apr 03 '18 at 12:25
What's the type of the list elements in the different cases? — hpaulj, Apr 03 '18 at 13:28
@hpaulj it is `` for `list(numpy-array)` and `int` otherwise — ead, Apr 03 '18 at 13:47
`list` iterates on the 1st dimension. If the array is 2d, the result is a list of 1d arrays. In general `list` and `tolist` don't produce the same thing, and shouldn't be expected to be comparable in speed. — hpaulj, Apr 03 '18 at 16:10

Paul Panzer · Accepted Answer · 2018-04-03T13:19:33.207

2

Here is a partial answer explaining your observation re the small integer pool:

>>> a = np.arange(10)
>>> type(list(a)[0])
<class 'numpy.int64'>
>>> type(a.tolist()[0])
<class 'int'>

As we can see tolist seems to try and create elements of native python type whereas the array iterator (which is what is used by the list constructor) doesn't bother.

Indeed, the C implementation of tolist (source here) uses PyArray_GETITEM which is the equivalent to Python arr[index].item(), not - as one might assume - arr[index]

edited Apr 03 '18 at 13:19

answered Apr 03 '18 at 13:03

Paul Panzer

51,835
3
54
99

Do you think, that creating a `numpy.int64`-object is slower than creating an `int`-object? – ead Apr 03 '18 at 13:48
@ead I really don't know. If you were to force me to take a guess I'd lean towards yes, but that's just superstition based on the vague experience that "all things numpy come with overheads". – Paul Panzer Apr 03 '18 at 14:06
I think the most time is used for creation of the int-objects. Please see my last edit which shows, that for bigger values it becomes slower, because another "integer"-type is used internally. – ead Apr 03 '18 at 14:07
@ead Python `int` objects smaller than 256 are cached ([see here](https://stackoverflow.com/questions/15171695/whats-with-the-integer-cache-inside-python)), which makes using them relatively cheap. – MB-F Apr 03 '18 at 14:09
@ead interesting, do you happen to know whether python on windows switches directly to arbitrary size ints (which I'd guess are way more expensive) or to something like int64 first? (Assuming it uses the platform C int for small ints.) – Paul Panzer Apr 03 '18 at 14:14
As far as I understand it is is either `PyLong_FromLong(long ival)` or the "worst case" `_PyLong_FromNbInt` - nothing in between. For Windows `long` is 32bit so for bigger numbers the slower version will be taken. – ead Apr 03 '18 at 21:16

ead · Answer 2 · 2018-12-01T20:26:29.393

Basically, the answer of Paul Panzer explains what happens: in the slow list(...) version the resulting elements of the list are not python-integers, but are numpy-scalars, e.g. numpy.int64. This answer just elaborates a little bit and connects the dots.

I didn't make a systematic profiling, but when stopped in the debugger, every time both versions were in the routines which created the integer-object, so it is very probably this is where the lion's share of the execution time is spent, and the overhead doesn't play a big role.

The list(..)-version, iterator calls array_item, which has an special treatment for one-dimensional arrays and calls PyArray_Scalar, which is a quite generic function and doesn't use the machinery of the Pythons-integer-creation. It happens to be slower than the Python-version, there is also no integer-pool for small values.

The .tolist() version calls recursive_tolist, which eventually uses Python's PyLong_FromLong(long), which shows all the observed behaviors and happens to be faster than numpy-functionality (probably, because it is not the normal way of using numpy, not many optimizations were done).

There is a small difference for Python2 compared to Python3: Python2 has two different classes of integers: the one, more efficient, for numbers up to 32bit and the one for arbitrary big numbers - thus for the bigger numbers the most general (and thus more costly) path must be taken - this also can be observed.

score 0 · Answer 3 · answered Apr 03 '18 at 12:42

0

Constructing a list with list(something) iterates over something and collects the result of the iteration into a new list.

If list(small_np) is slower than list(small_arr) one could assume that iterating over small_np is slower than iterating over small_arr. Let's verify that:

%timeit for i in small_np: pass   # 1 loop, best of 3: 916 ms per loop
%timeit for i in small_arr: pass  # 1 loop, best of 3: 261 ms per loop

Yep, iterating over a numpy array seems to be slower. This is where I must start speculating. Numpy arrays are flexible. They can have an arbitrary number of dimensions with various strides. Array arrays are always flat. This flexibility probably likely comes at a cost, which manifests in more complex iteration.

answered Apr 03 '18 at 12:42

MB-F

22,770
4
61
116

I don't believe (without further proof) that the iteration is slower because of stride or similar. It could be slower, because the creation of the returned object is slower (and the question is, why this is the case here). – ead Apr 03 '18 at 13:07
Iterating over a `numpy.ndarray` potentially iterates over subarrays while iterating over `array.array` always iterates over elements. This will *at least* require additional checks in numpy. – MB-F Apr 03 '18 at 13:22
@ead for `list(a)` vs `a.tolist` we can see from the source that `list(a)` goes through the iterator protocol, whereas 'a.tolist' uses a simple direct loop. There doesn't seem to be any special casing of `array.array´ (lists and tuples are special cased), so I would concur that it likely boils down to differences in the iterators. – Paul Panzer Apr 03 '18 at 13:26

What is the reason for slowness of list(numpy.array)?

3 Answers3