Why is numpy ma.average 24 times slower than arr.mean?

Question

I've found something interesting in Python's numpy. ma.average is a lot slower than arr.mean (arr is an array)

>>> arr = np.full((3, 3), -9999, dtype=float)
array([[-9999., -9999., -9999.],
       [-9999., -9999., -9999.],
       [-9999., -9999., -9999.]])

%timeit np.ma.average(arr, axis=0)
The slowest run took 49.32 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 191 µs per loop

%timeit arr.mean(axis=0)
The slowest run took 6.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.41 µs per loop

with random numbers

arr = np.random.random((3,3))

%timeit arr.mean(axis=0)
The slowest run took 6.17 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 7.78 µs per loop

%timeit np.ma.average(arr, axis=0)
1000 loops, best of 3: 186 µs per loop

--> That's nearly 24 times slower.

Documentation

numpy.ma.average(a, axis=None, weights=None, returned=False)

Return the weighted average of array over the given axis.

numpy.mean(a, axis=None, dtype=None, out=None, keepdims)

Compute the arithmetic mean along the specified axis.

Why is ma.average so much slower than arr.mean? Mathematically they are the same (correct me if I'm wrong). My guess is that it has something to do with the weighted options on ma.average but shouldn't be there a fallback if no weights are passed?

[Masked operations](https://docs.scipy.org/doc/numpy/reference/maskedarray.generic.html) (you see the ```.ma.```?) are slow! — sascha, Aug 08 '17 at 16:14
Furthermore testing on small amounts of data is not a good practice: what is the difference between 7.78us en 186us? Not much. You need to use larger matrices. — Willem Van Onsem, Aug 08 '17 at 16:16
;) thanks, did not think about that. Anyway there is no mask on the given array. found it in a older code that used to have a masked array there. — mumbala, Aug 08 '17 at 16:18
Does (probably) not matter. Different computation-model. Compare ```np.mean``` and ```np.average``` (two non-mask functions) and use bigger data! — sascha, Aug 08 '17 at 16:18
@WillemVanOnsem test it on a large amount in a project, it shortened the runtime of that part by 7. 7.78 to 186 can have a big impact if it's called often — mumbala, Aug 08 '17 at 16:19
@mumbala: I know that. I'm only saying that in future, you better publish results of tests on large batches. If here `ma.average` would redirect to `np.mean` (which it does not), even the redirection could have an enormous impact. — Willem Van Onsem, Aug 08 '17 at 16:22

MSeifert · Accepted Answer · 2017-08-08T19:28:19.403

A good way to find out why something is slower is to profile it. I'll use the 3rd party library line_profiler and the IPython command %lprun (see for example this blog) here:

%load_ext line_profiler

import numpy as np
arr = np.full((3, 3), -9999, dtype=float)

%lprun -f np.ma.average np.ma.average(arr, axis=0)

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   519                                           def average(a, axis=None, weights=None, returned=False):
   ...
   570         1         1810   1810.0     30.5      a = asarray(a)
   571         1           15     15.0      0.3      m = getmask(a)
   572                                           
   573                                               # inspired by 'average' in numpy/lib/function_base.py
   574                                           
   575         1            5      5.0      0.1      if weights is None:
   576         1         3500   3500.0     59.0          avg = a.mean(axis)
   577         1          591    591.0     10.0          scl = avg.dtype.type(a.count(axis))
   578                                               else: 
   ...
   608                                           
   609         1            7      7.0      0.1      if returned:
   610                                                   if scl.shape != avg.shape:
   611                                                       scl = np.broadcast_to(scl, avg.shape).copy()
   612                                                   return avg, scl
   613                                               else:
   614         1            5      5.0      0.1          return avg

I removed some irrelevant lines.

So actually 30% of the time is spent in np.ma.asarray (something that arr.mean doesn't have to do!).

However the relative times change drastically if you use a bigger array:

arr = np.full((1000, 1000), -9999, dtype=float)

%lprun -f np.ma.average np.ma.average(arr, axis=0)
Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
   519                                           def average(a, axis=None, weights=None, returned=False):
   ...
   570         1          609    609.0      7.6      a = asarray(a)
   571         1           14     14.0      0.2      m = getmask(a)
   572                                           
   573                                               # inspired by 'average' in numpy/lib/function_base.py
   574                                           
   575         1            7      7.0      0.1      if weights is None:
   576         1         6924   6924.0     86.9          avg = a.mean(axis)
   577         1          404    404.0      5.1          scl = avg.dtype.type(a.count(axis))
   578                                               else:
   ...
   609         1            6      6.0      0.1      if returned:
   610                                                   if scl.shape != avg.shape:
   611                                                       scl = np.broadcast_to(scl, avg.shape).copy()
   612                                                   return avg, scl
   613                                               else:
   614         1            6      6.0      0.1          return avg

This time the np.ma.MaskedArray.mean function almost takes up 90% of the time.

Note: You could also dig deeper and look into np.ma.asarray or np.ma.MaskedArray.count or np.ma.MaskedArray.mean and check their line profilings. But I just wanted to show that there are lots of called function that add to the overhead.

So the next question is: did the relative times between np.ndarray.mean and np.ma.average also change? And at least on my computer the difference is much lower now:

%timeit np.ma.average(arr, axis=0)
# 2.96 ms ± 91 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit arr.mean(axis=0)
# 1.84 ms ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

This time it's not even 2 times slower. I assume for even bigger arrays the difference will get even smaller.

This is also something that is actually quite common with NumPy:

The constant factors are quite high even for plain numpy functions (see for example my answer to the question "Performance in different vectorization method in numpy"). For np.ma these constant factors are even bigger, especially if you don't use a np.ma.MaskedArray as input. But even though the constant factors might be high, these functions excel with big arrays.

Thank you very much. There couldn't be a better answer! Also thanks for line_profiler with examples — mumbala, Aug 08 '17 at 20:15

mumbala · Answer 2 · 2017-08-08T20:22:24.823

Thanks to @WillemVanOnsem and @sascha in the comments above

edit: applies to small arrays, see accepted answer for more information

Masked operations are slow try, to avoid it:

mask = self.local_pos_history[:, 0] > -9
local_pos_hist_masked = self.local_pos_history[mask]
avg = local_pos_hist_masked.mean(axis=0)

old with masked

mask = np.ma.masked_where(self.local_pos_history > -9, self.local_pos_history)
local_pos_hist_mask = self.local_pos_history[mask].reshape(len(self.local_pos_history) // 3, 3)
avg_pos = self.local_pos_history

np.average is nearly equal to arr.mean:

%timeit np.average(arr, axis=0)
The slowest run took 5.81 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 9.89 µs per loop

%timeit np.mean(arr, axis=0)
The slowest run took 6.44 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.74 µs per loop

just for clarification still a tests on small batch

Why is numpy ma.average 24 times slower than arr.mean?

Documentation

2 Answers2