Pointwise array average ignoring NaN

Question

Possible Duplicate:
NumPy: calculate averages with NaNs removed

I have several identically-shaped numpy arrays. I want to take their pointwise average with a small twist: a np.nan value should be ignored in the averaging. In other words, average(np.array([1,2,3]), np.array([5,np.nan,7]), np.array([np.nan, 4, 2]) should equal np.array([3,3,4]).

Of course, I can do that by iterating through the elements within each numpy array, but I was hoping to avoid it. Is there a better way to implement this function?

(Python 3, but I doubt it matters.)

What you want has been answered here: http://stackoverflow.com/questions/5480694/numpy-calculate-averages-with-nans-removed — HerrKaputt, Dec 09 '12 at 23:07
@HerrKaputt Sorry, it sure has... I somehow convinced myself that nobody would have been trying to do this, and so I didn't do a careful search for existing questions :( — max, Dec 10 '12 at 00:44
No need to apologize! In fact, I don't think hayden's answer (using nanmean) was mentioned in that other link... — HerrKaputt, Dec 10 '12 at 09:38

Andy Hayden · Accepted Answer · 2012-12-09T23:20:37.433

4

You can use scipy.stat's nanmean:

import numpy as np
from scipy.stats import nanmean
s = np.array([[1.0, 2.0, 3.0], [5.0, np.nan, 7.0], [np.nan, 4.0, 2.0]])

In [4]: nanmean(s)
Out[4]: array([ 3.,  3.,  4.])

@Dougal points out in the comments that the bottleneck package, which has significantly faster implementations of several numpy/scipy functions, includes an nanmean.

edited Dec 09 '12 at 23:20

answered Dec 09 '12 at 23:12

Andy Hayden

359,921
101
625
535

3

Note that [the bottleneck package](http://pypi.python.org/pypi/Bottleneck) has `bottleneck.nanmean`, which runs 10-30 times faster in their tests than does `scipy.stats.nanmean`. – Danica Dec 09 '12 at 23:13

score 1 · Answer 2 · answered Dec 09 '12 at 23:50

You can also convert the array to a masked array (masking all the NaNs with fix_invalid) and perform your operations there:

new_array = np.ma.fix_invalid(my_array)
print np.mean(new_array)

If it's just for the average, then the suggested nanmean by @hayden is about 4x faster. But if you want to do other operations on the array, it's a better bet to use masked arrays instead.

Pointwise array average ignoring NaN

2 Answers2