Is NumPy any faster than default python when iterating over a list?

Question

I have a project where data is stored in a 2-dimensional array and each needs to be analyzed. I need to iterate through it. I just learned NumPy and was considering using it for this project, but after running some code I found that it ended up slower than just using a for loop.

import numpy
import time

list_ = [[num + 1 for num in range(5)] for lst in range(1000000)]
array = numpy.array([[num + 1 for num in range(5)] for lst in range(1000000)])

start = time.perf_counter()
count = 0
for lst in list_:
    for num in lst:
        count += 1
elapsed = time.perf_counter() - start
print(f"A default python list took {elapsed:0.5f} seconds to get through {count} elements.")

start = time.perf_counter()
count = 0
for num in numpy.nditer(array):
    count += 1
elapsed = time.perf_counter() - start
print(f"A numpy array took {elapsed:0.5f} seconds to get through {count} elements.")

The for loop takes on average ~0.5 seconds, while NumPy takes ~0.67 seconds. My run-through of NumPy syntax was somewhat surface level, so is there something I'm missing here that could run faster than the conventional for loop?

In your example, you're still creating a list, and *then* putting it in a numpy array, and when using the numpy array, you're still getting out all the elements one at a time to operate on them, so there is very little opportunity for numpy to be faster. If you're just counting elements, use `len()` (on either); if you need to perform other operations on the element, numpy generally has better ways op optimising those operations, but you'd need to ask about a more useful example. — Grismar, Jul 21 '22 at 03:30
If you must iterate, it's best to use python lists. `numpy` arrays take time to create from lists, but once created, applying `numpy` methods and functions to the whole arrays is noticeably faster. Iterating on arrays as though they were lists is slower. And stay away from `nditer` unless you really need (and understand) its extra functionality. As a plain iterator it is inferior. — hpaulj, Jul 21 '22 at 03:46
Welcome to Stack Overflow. It comes across that you are looking for a general guide to using Numpy efficiently and avoiding performance traps - that is too broad for a question here; [please try](https://meta.stackoverflow.com/questions/261592) to find one yourself. But the short version is that Numpy doesn't magically improve algorithms that are written with the same native Python code; it improves performance by using *its own data structures* and allowing you to *use pre-written C code* for part of the algorithm - especially the loops. — Karl Knechtel, Jul 21 '22 at 04:01
Iterating over numpy arrays is slower than iterating over python lists. If you want the speed benefits of numpy, you need to use numpy methods and vectorized routines — juanpa.arrivillaga, Jul 21 '22 at 05:16

score 3 · Answer 1 · answered Jul 21 '22 at 03:57

Native python is slow (~100x slower than C). Numpy is fast because it is written in C and converts its internal data to ctypes. There is an overhead every time you need to pass data between numpy and python.

To use numpy effectively, you need to read the docs and find the approprate numpy function to have numpy run vectorized loops internally without resorting to a native python loop.

Here is your code rewritten using compiled numpy functions, and its an order (or two) of magnitude faster

import numpy as np
import time

start = time.perf_counter()
list_ = [[num + 1 for num in range(5)] for lst in range(1000000)]
elapsed = time.perf_counter() - start
print(f"{elapsed:0.3f}s = native python list initialization")

start = time.perf_counter()
array = np.full((1000000,5), range(1,5+1))
elapsed = time.perf_counter() - start
print(f"{elapsed:0.3f}s = numpy list initialization for shape {array.shape}")

start = time.perf_counter()
count = 0
sum   = 0
for lst in list_:
    for num in lst:
        count += 1
        sum   += num
elapsed = time.perf_counter() - start
print(f"{elapsed:0.3f}s = native python list to sum {count} elements = {sum}")

start = time.perf_counter()
count = array.size
sum   = np.sum(array)
elapsed = time.perf_counter() - start
print(f"{elapsed:0.3f}s = numpy array to sum {count} elements = {sum}")

3.114s = native python list initialization
0.603s = numpy list initialization for shape (1000000, 5)
1.388s = native python list to sum 5000000 elements = 15000000
0.020s = numpy array to sum 5000000 elements = 15000000

score 3 · Answer 2 · answered Jul 21 '22 at 05:09

The previous answers pointed out in great detail how much faster NumPy is for particular tasks. I would like to add a holistic viewpoint on lists and NumPy arrays.

TL;DR: NumPy is fast whenever numerical computation of array-like objects is involved, sometimes by a factor of 10-100. However, looping through an array (instead of leveraging vectorization) as in your case can slow down the code so much that the computational benefit vanishes. Additionally, setting up a np.array or leveraging a NumPy function can introduce overhead when applied to a mere scalar that counteracts performance. Andrej Karpathy pointed this out in an infamous tweet.

Why and When NumPy is faster

On the one hand, NumPy arrays are densely packed, same-type data structures that leverage nearness in memory. On the other hand, NumPy operations are implemented in lightning-fast C. In addition, NumPy uses hardware-specific trickery (e.g. Intel's MKL) for accelerated vectorized operations on the CPU. This post goes into further detail on that.

I came up with four scenarios to test this. These are far from exhaustive but give a little bit of an idea. Note that the plot shows runtime on a logarithmic scale.

Initialize a list/an array

(0) exclusively with 1 entries: NumPy is drastically faster in creating a static array of 1s
(1) index-dependent entries: NumPy is slower as the list comprehension is used to initialize the np.array

(2) Inner product: This is where NumPy shines as it was specifically set up for such vector/matrix operations. Since the inner product (and, more generally, matrix multiplication) is at the heart of many scientific computations, NumPy is king.

(3) Reduction (example: maximum): Reducing/aggregating a list or an array to a single scalar usually works faster with NumPy even though min(), max(), sum() etc. are built-in functions in Python.

map: apply functions to the list/array

(4) sqrt
(5) defined function f: in this particular case, list comprehension turned out to be faster. However, the discussion is more complicated on how to apply functions efficiently.

Stemming from the code below that ran on a 2.8 GHz Quad-Core Intel i7.

import numpy as np
import timeit
import math

N = 1e6

def f(x):
    return 0.5*x**3 - 1.0/(x + 3.0)

# initialization
%timeit v1_l = [1 for i in range(N)]
%timeit v1_n = np.ones(N)

%timeit v2_l = [i**2 for i in range(N)]
%timeit v2_n = np.array([i**2 for i in range(N)])

# inner product
%timeit sum([v1_i * v1_j for (v1_i, v1_j) in zip(v1_l, v2_l)])
%timeit np.inner(v1_n, v2_n)

# reduction 
%timeit max(v2_l)
%timeit np.max(v2_n)

# map
%timeit [math.sqrt(x) for x in v2_l]
%timeit np.sqrt(v2_n)

%timeit [f(x) for x in v2_l]
%timeit np.fromiter((f(x) for x in v2_n), v2_n.dtype, count=len(v2_n))

What else to consider...

If you consistently deal with loops, consider Cython. In case you keep dealing with k-dimensional array-like structures, stay with NumPy. Chances are that numerical subsequent operations outweigh the initial performance disadvantage you might perceive.

J. Choi · Answer 3 · 2022-07-21T04:37:54.817

import numpy as np

list_2D = [[num for num in range(1, 6)] for lst in range(100000)]
arr_2D = np.array(list_2D)

%timeit -n 10 -r 7 sum([sum(list_1D) for list_1D in list_2D])
> 11.5 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit -n 10 -r 7 np.sum(arr_2D)
> 150 µs ± 10.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

NumPy array is 77 times faster for this example to calculate the sum of all elements.

hpaulj · Answer 4 · 2022-07-21T04:16:10.647

Reproducing your code, but with a much smaller range so we can actually look at the lists and arrays:

In [2]: alist = [[num + 1 for num in range(5)] for lst in range(10)]
In [3]: alist
Out[3]: 
[[1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5],
 [1, 2, 3, 4, 5]]

While we could make an array from that, np.array(alist), we can make one by combining a 5 element array with a "vertical" 10 element one:

In [4]: arr = np.arange(1,6)+np.zeros((10,1),int)
In [5]: arr
Out[5]: 
array([[1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5]])

Your count loop:

In [6]: count = 0
   ...: for lst in alist:
   ...:     for num in lst:
   ...:         count += 1
   ...: 
In [7]: count
Out[7]: 50

And its time - here I use timeit which repeats the run and gets a average time. In an ipython session it's very easy to use:

In [8]: %%timeit
   ...: count = 0
   ...: for lst in alist:
   ...:     for num in lst:
   ...:         count += 1
   ...: 
2.33 µs ± 0.833 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Similar iteration on the 2d array - significantly slower:

In [9]: %%timeit
   ...: count = 0
   ...: for lst in arr:
   ...:     for num in lst:
   ...:         count += 1
   ...: 
18.1 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

nditer is still slower than the list loop. Usually nditer is slower than regular iteration. Here's it's relatively fast because is isn't doing anything with the num variable. So this isn't a good test of its performance.

In [10]: %%timeit
    ...: count = 0
    ...: for num in numpy.nditer(arr):
    ...:     count += 1
    ...: 
7 µs ± 16.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

But if we use the array as intended, we get something much better (and ore so with a bigger arr.

In [11]: np.count_nonzero(arr)
Out[11]: 50
In [12]: timeit np.count_nonzero(arr)

960 ns ± 2.24 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Another way - it's not so good with this small array, but I expect it will scale better than the list loop:

In [17]: timeit (arr>0).sum()
10.2 µs ± 32.5 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In sum - numpy can be faster, if used right. But don't try to imitate python list methods with it.

Is NumPy any faster than default python when iterating over a list?

4 Answers4