1

Numpy is supposed to be fast. However, when comparing Numpy ufuncs with standard Python functions I find that the latter are much faster.

For example,

aa = np.arange(1000000, dtype = float)
%timeit np.mean(aa) # 1000 loops, best of 3: 1.15 ms per loop
%timeit aa.mean # 10000000 loops, best of 3: 69.5 ns per loop

I got similar results with other Numpy functions like max, power. I was under the impression that Numpy has an overhead that makes it slower for small arrays but would be faster for large arrays. In the code above aa is not small: it has 1 million elements. Am I missing something?

Of course, Numpy is fast, only the functions seem to be slow:

bb = range(1000000)
%timeit mean(bb) # 1 loops, best of 3: 551 ms per loop
%timeit mean(list(bb)) # 10 loops, best of 3: 136 ms per loop
Saullo G. P. Castro
  • 56,802
  • 26
  • 179
  • 234
Soldalma
  • 4,636
  • 3
  • 25
  • 38
  • Which version of Python are you using? In Python 2, `range()` returns a list. In Python 3, `range()` returns an iterator. This will have a massive impact on your performance measurements. – Greg Hewgill Aug 05 '13 at 01:46
  • `np.mean(aa)` and `aa.mean` seem to be the same function. – Sukrit Kalra Aug 05 '13 at 01:47
  • 8
    FYI, `aa.mean` doesn't do anything. You're not calling the function, you're only naming it. That's why it's so fast. IOW, you wanted `aa.mean()`. – DSM Aug 05 '13 at 01:48
  • @DSM : True. After doing a timeit, they seem to be similar, `aa.mean()` seems a tad bit faster though. – Sukrit Kalra Aug 05 '13 at 01:51

2 Answers2

6

Others already pointed out that your comparison is not a real comparison (you are not calling the function + both are numpy).
But to give an answer to the question "Are numpy function slow?": generally speaking, no, numpy function are not slow (or not slower than plain python function). Off course there are some side notes to make:

  • 'Slow' depends off course on what you compare with, and it can always faster. With things like cython, numexpr, numba, calling C-code, ... and others it is in many cases certainly possible to get faster results.
  • Numpy has a certain overhead, which can be significant in some cases. For example, as you already mentioned, numpy can be slower on small arrays and scalar math. For a comparison on this, see eg Are NumPy's math functions faster than Python's?

To make the comparison you wanted to make:

In [1]: import numpy as np
In [2]: aa = np.arange(1000000)
In [3]: bb = range(1000000)

For the mean (note, there is no mean function in python standard library: Calculating arithmetic mean (average) in Python):

In [4]: %timeit np.mean(aa)
100 loops, best of 3: 2.07 ms per loop

In [5]: %timeit float(sum(bb))/len(bb)
10 loops, best of 3: 69.5 ms per loop

For max, numpy vs plain python:

In [6]: %timeit np.max(aa)
1000 loops, best of 3: 1.52 ms per loop

In [7]: %timeit max(bb)
10 loops, best of 3: 31.2 ms per loop

As a final note, in the above comparison I used a numpy array (aa) for the numpy functions and a list (bb) for the plain python functions. If you would use a list with numpy functions, in this case it would again be slower:

In [10]: %timeit np.max(bb)
10 loops, best of 3: 115 ms per loop

because the list is first converted to an array (which consumes most of the time). So, if you want to rely on numpy in your application, it is important to make use of numpy arrays to store you data (or if you have a list, convert it to an array so this conversion has to be done only once).

Community
  • 1
  • 1
joris
  • 133,120
  • 36
  • 247
  • 202
5

You're not calling aa.mean. Put the function call parentheses on the end, to actually call it, and the speed difference will nearly vanish. (Both np.mean(aa) and aa.mean() are NumPy; neither uses Python builtins to do the math.)

user2357112
  • 260,549
  • 28
  • 431
  • 505