Python performance iterating through lists, numpy arrays

Question

I am working on a project where I have to loop through large arrays (lists), accessing each item by index. Usually this involves checking each element against a condition, and thereafter potentially updating its value.

I've noticed that this is extremely slow compared to, for example, doing a similar thing in C#. Here is a sample of simply looping through arrays and reassigning each value:

C#:

var a = new double[10000000];
var watch = System.Diagnostics.Stopwatch.StartNew();

for (int i = 0; i < a.Length; i++)
{
     a[i] = 1.0;
}      
watch.Stop();
var elapsedMs = watch.ElapsedMilliseconds;
//About 40ms

Python:

a = []
for i in range(0, 10000000):
    a.append(0.0)

t1 = time.clock()

for i in range(0, 10000000):
    a[i] = 1.0

t2 = time.clock()
totalTime = t2-t1
//About 900ms

The python code here seems to be over 20x slower. I am relatively new to python so I cannot judge if this kind of performance is "normal", or if I am doing something horribly wrong here. I am running Anaconda as my python environment, PyCharm is my IDE.

Note: I have tried using nditer on numpy arrays with no significant performance increase.

Many thanks in advance for any tips!

UPDATE: I've just compared the following two approaches:

#timeit: 43ms
a = np.random.randn(1000,1000)
a[a < 0] = 100.0

#timeit: 1650ms
a = np.random.randn(1000,1000)
for x in np.nditer(a, op_flags=['readwrite']):
    if (x < 0):
        x[...] = 100.0

looks like the first (vectorized) approach is the way to go here...

Python-loops are slower than it's counterparts in many other languages. Either vectorize your operations with numpy (depends on your operations) or speed up loops with [cython](http://cython.org/) or [numba](https://numba.pydata.org/) (cython is also used within scipy and even more in scikit-learn; popular scientific libs). — sascha, Mar 18 '18 at 14:38
https://stackoverflow.com/questions/10712002/create-an-empty-list-in-python-with-certain-size This question might help. Is creating a list with "append" inefficient somehow? — Pam, Mar 18 '18 at 14:47
What you call the first-approach here, ```a[a<0] = 100.0``` is what is often referred to *vectorization* within numpy. — sascha, Mar 18 '18 at 16:07
Thanks @sascha, I will look into what exactly vectorization refers to — fischja, Mar 18 '18 at 16:09

JohanL · Answer 1 · 2018-03-18T15:16:29.903

If you are using numpy you should use the numpy array type and then take advantage of numpy functions and broadcasting:

If your specific need is to assign 1.0 to all elements, there is a specific function for that in numpy:

import numpy as np
a = np.ones(10_000_000)

For a somewhat more general approach, where you first define the array and then assign all the different elements:

import numpy as np
a = np.empty(10_000_000)
a[:] = 1.0 # This uses the broadcasting

By default, the np.array is of type double so all elements will be doubles.

Also, as a side note, using time for performance measurement is not optimal, since that will use the "wall clock time" and can be heavily affected by other programs running in the computer. Consider looking into the timeit module.

Thanks for your reply, especially the comment about the timeit module! Cheers — fischja, Mar 18 '18 at 15:59

score -1 · Answer 2 · answered Mar 18 '18 at 15:16

-1

Python has a bunch of great ways to iterate on structures. There is good chance, if you find yourself using what I call C style looping:

for i in range(0, 10000000):
    a = [1.0] * 10000000

... you are likely doing it wrong.

In my quick testing this is 40x faster than the above:

a = [1.0] * 10000000

Even a basic list comprehension is 3x faster:

a = [1.0 for i in range(0, 10000000)]

And as mentioned in comments, cython, numba and numpy, can all be used to provide various speedups for this sort of work.

answered Mar 18 '18 at 15:16

Stephen Rauch

47,830
31
106
135

Thanks for your reply. I've just tried a[a<0] = 100.0 where a is a numpy array, and I seem to be getting at least a bit of a performance increase. – fischja Mar 18 '18 at 15:57
I don't understand your comparison. You are doing two different things. – miradulo Mar 18 '18 at 16:12

Python performance iterating through lists, numpy arrays

2 Answers2