There are many reasons why this performance test does not give useful results.
Don't compare, or pay attention to, release
timing. The entire point of using a language like C or C++ is to enable (static) compiler optimizations. So really, the results are the same. On the other hand, it is important to make sure that aggressive compiler optimizations don't optimize out your entire test (due to the result of computation going unused, or due to undefined behaviour anywhere in your program, or due to the compiler assuming that part of the code can't actually be reached because it there would be undefined behaviour if it were reached).
for i in [x]:
is a pointless loop: it creates a Python list of one element, and iterates once. That one iteration does i *= a
, i.e., it multiplies i
, which is the Numpy array. The code only works accidentally; it happens that Numpy arrays specially define *
to do a loop and multiply each element. Which brings us to...
The entire point of using Numpy is that it optimizes number-crunching by using code written in C behind the scenes to implement algorithms and data structures. i
simply contains a pointer to a memory allocation that looks essentially the same as the one the C program uses, and i *= a
does a few O(1) checks and then sets up and executes a loop that looks essentially the same as the one in the C code.
This is not reliable timing methodology, in general. That is a whole other kettle of fish. The Python standard library includes a timeit
module intended to make timing easier and help avoid some of the more basic traps. But doing this properly in general is a research topic beyond the scope of a Stack Overflow question.
"But I want to see the slow performance of native Python, rather than Numpy's optimized stuff - "
If you just want to see the slow performance of Python iteration, then you need for the loop to actually iterate over the elements of the array (and write them back):
def mult(x, a):
for i in range(len(x)):
x[i] *= a
Except that experienced Pythonistas won't write the code that way, because range(len(
is ugly. The Pythonic approach is to create a new list:
def mult(x, a):
return [i*a for i in x]
That will also show you the inefficiency of native Python data structures (we need to create a new list, which contains pointers to int
objects).
On my machine, it is actually even slower to process the Numpy array this way than a native Python list. This is presumably because of the extra work that has to be done to interface the Numpy code with native Python, and "box" the raw integer data into int
objects.