1

Possible Duplicate:
NumPy: calculate averages with NaNs removed

I have several identically-shaped numpy arrays. I want to take their pointwise average with a small twist: a np.nan value should be ignored in the averaging. In other words, average(np.array([1,2,3]), np.array([5,np.nan,7]), np.array([np.nan, 4, 2]) should equal np.array([3,3,4]).

Of course, I can do that by iterating through the elements within each numpy array, but I was hoping to avoid it. Is there a better way to implement this function?

(Python 3, but I doubt it matters.)

Community
  • 1
  • 1
max
  • 49,282
  • 56
  • 208
  • 355
  • 2
    What you want has been answered here: http://stackoverflow.com/questions/5480694/numpy-calculate-averages-with-nans-removed – HerrKaputt Dec 09 '12 at 23:07
  • @HerrKaputt Sorry, it sure has... I somehow convinced myself that nobody would have been trying to do this, and so I didn't do a careful search for existing questions :( – max Dec 10 '12 at 00:44
  • No need to apologize! In fact, I don't think hayden's answer (using nanmean) was mentioned in that other link... – HerrKaputt Dec 10 '12 at 09:38

2 Answers2

4

You can use scipy.stat's nanmean:

import numpy as np
from scipy.stats import nanmean
s = np.array([[1.0, 2.0, 3.0], [5.0, np.nan, 7.0], [np.nan, 4.0, 2.0]])

In [4]: nanmean(s)
Out[4]: array([ 3.,  3.,  4.])

@Dougal points out in the comments that the bottleneck package, which has significantly faster implementations of several numpy/scipy functions, includes an nanmean.

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • 3
    Note that [the bottleneck package](http://pypi.python.org/pypi/Bottleneck) has `bottleneck.nanmean`, which runs 10-30 times faster in their tests than does `scipy.stats.nanmean`. – Danica Dec 09 '12 at 23:13
1

You can also convert the array to a masked array (masking all the NaNs with fix_invalid) and perform your operations there:

new_array = np.ma.fix_invalid(my_array)
print np.mean(new_array)

If it's just for the average, then the suggested nanmean by @hayden is about 4x faster. But if you want to do other operations on the array, it's a better bet to use masked arrays instead.

tiago
  • 22,602
  • 12
  • 72
  • 88