It is well known, that if a
is a numpy array, then a.tolist()
is faster than list(a)
, for example:
>>> import numpy as np
>>> big_np=np.random.randint(1,10**7,(10**7,))
>>> %timeit list(big_np)
1 loop, best of 3: 869 ms per loop
>>> %timeit big_np.tolist()
1 loop, best of 3: 306 ms per loop
That means, the naive list(a)
version is about factor 3
slower than the special-function tolist()
.
However, comparing it to the the performance of the build-in array
-module:
>>> import array
>>> big_arr=array.array('i', big_np)
>>> %timeit list(big_arr)
1 loop, best of 3: 312 ms per loop
we can see, that one should probably say, that list(a)
is slow rather than tolist()
is fast, because array.array
is as fast as the special function.
Another observation: array.array
-module and tolist
benefit from the small-integer-pool (i.e. when values are in range [-5, 256]
), but this is not the case for list(a)
:
##only small integers:
>>> small_np=np.random.randint(1,250, (10**7,))
>>> small_arr=array.array('i', small_np)
>>> %timeit list(small_np)
1 loop, best of 3: 873 ms per loop
>>> %timeit small_np.tolist()
10 loops, best of 3: 188 ms per loop
>>> %timeit list(small_arr)
10 loops, best of 3: 150 ms per loop
As we can see the faster versions are about 2 times faster, but the slow version is as slow as before.
My question: what slows list(numpy.array)
down compared to list(array.array)
?
Edit:
One more observation, for Python2.7, it takes longer if the integers are bigger (i.e. cannot be hold by int32
):
>>> very_big=np.random.randint(1,10**7,(10**7,))+10**17
>>> not_so_big=np.random.randint(1,10**7,(10**7,))+10**9
>>> %timeit very_big.tolist()
1 loop, best of 3: 627 ms per loop
>>> %timeit not_so_big.tolist()
1 loop, best of 3: 302 ms per loop
but still faster, than the slow list-version.