Average arrays with Null values

Question

Possible Duplicate:
avarage of a number of arrays with numpy without considering zero values

I am working on numpy and I have a number of arrays with the same size and shape. They are 500*500. It has some Null values. I want to have an array that is result of one by one element average of my original arrays. For example:

A=[ 1 Null 8 Null; Null 4 6 1]
B=[ 8 5 8 Null; 5 9 5 3]

the resulting array should be like:

C=[ 4.5 5 8 Null; 5 6.5 5.5 2]

How can I do that?

No, I mean there's no thing called `Null` in Python/NumPy. Is it `numpy.nan`, `None` or what? — NPE, Nov 08 '12 at 11:44

score 7 · Accepted Answer · edited May 23 '17 at 11:56

Update: As of NumPy 1.8, you could use np.nanmean instead of scipy.stats.nanmean.

If you have scipy, you could use scipy.stats.nanmean:

In [2]: import numpy as np

In [45]: import scipy.stats as stats

In [3]: nan = np.nan

In [43]: A = np.array([1, nan, 8, nan, nan, 4, 6, 1])   
In [44]: B = np.array([8, 5, 8, nan, 5, 9, 5, 3])  
In [46]: C = np.array([A, B])    
In [47]: C
Out[47]: 
array([[  1.,  nan,   8.,  nan,  nan,   4.,   6.,   1.],
       [  8.,   5.,   8.,  nan,   5.,   9.,   5.,   3.]])

In [48]: stats.nanmean(C)
Warning: invalid value encountered in divide
Out[48]: array([ 4.5,  5. ,  8. ,  nan,  5. ,  6.5,  5.5,  2. ])

You can find other numpy-only (masked-array) solutions, here. Namely,

In [60]: C = np.array([A, B])    
In [61]: C = np.ma.masked_array(C, np.isnan(C))    
In [62]: C
Out[62]: 
masked_array(data =
 [[1.0 -- 8.0 -- -- 4.0 6.0 1.0]
 [8.0 5.0 8.0 -- 5.0 9.0 5.0 3.0]],
             mask =
 [[False  True False  True  True False False False]
 [False False False  True False False False False]],
       fill_value = 1e+20)

In [63]: np.mean(C, axis = 0)
Out[63]: 
masked_array(data = [4.5 5.0 8.0 -- 5.0 6.5 5.5 2.0],
             mask = [False False False  True False False False False],
       fill_value = 1e+20)

In [66]: np.ma.filled(np.mean(C, axis = 0), nan)
Out[67]: array([ 4.5,  5. ,  8. ,  nan,  5. ,  6.5,  5.5,  2. ])

An advantage of `np.ma` is that it works with integer arrays, while the `nan...` functions require float arrays as inputs. — Pierre GM, Nov 08 '12 at 12:34
@PierreGM: Ah yes, because `np.nan`s are not allowed in integer arrays. Thanks for pointing this out. — unutbu, Nov 08 '12 at 12:36
There's also [numpy.nanmean](http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.nanmean.html) if you don't have scipy. — rjf, Apr 15 '14 at 22:24

score 1 · Answer 2 · answered Nov 08 '12 at 11:49

Starting from the lists like (you can also have None's instead of 0's)
```
A = [1, 0, 8, 0, 0, 4, 6, 1]
B = [8, 5, 8, 0, 5, 9, 5, 3]
```
Then you should have a list like:
```
lst = [A, B]
```

Define a function to compute the mean of a list of numbers:

def mean(nums):
    return float(sum(nums)) / len(nums) if nums else 0

Finally you can compute the average in this way:

C = [mean(filter(None, col)) for col in zip(*list)]

Average arrays with Null values

2 Answers2