Mean ignoring NaNs along columns in a NumPy array without using numpy.nanmean

Question

I have a numpy array like the following:

x = array([[  1.,   2.,   3.],
           [  4.,   5.,   6.],
           [ nan,   8.,   9.]])

and I want to calculate the mean of each column. If I use np.mean(x, axis=0), then I get nan as the mean of the first column, and using x[~np.isnan(x)] to filter out nan values flattens the array into a 1D array.

I'm required to use an older version of numpy, so I can't use numpy.nanmean

Edit: This comment explains why this isn't a duplicate of the question posted

Possible duplicate of [NumPy: calculate averages with NaNs removed](https://stackoverflow.com/questions/5480694/numpy-calculate-averages-with-nans-removed) — Stefano Nardo, Jun 27 '17 at 13:32
@StefanoNardo Good find. The answer posts to the linked Q&A there are basically suggesting using `numpy.ma.masked_array`, which I haven't found to be efficient or using `nanmean` in some form, which OP can't use. Given the cirumstances, IMHO using a regular boolean array for masking would be the way to go. — Divakar, Jun 27 '17 at 13:38

score 2 · Accepted Answer · answered Jun 27 '17 at 13:29

One approach would be using boolean-indexing -

def nanmean_cols(x):
    mask = ~np.isnan(x)
    x_masked = np.where(mask, x, 0)
    return x_masked.sum(0)/mask.sum(0)

Sample run -

In [114]: x
Out[114]: 
array([[  1.,   2.,   3.],
       [  4.,   5.,   6.],
       [ nan,   8.,   9.]])

In [115]: np.nanmean(x,axis=0)
Out[115]: array([ 2.5,  5. ,  6. ])

In [117]: nanmean_cols(x)
Out[117]: array([ 2.5,  5. ,  6. ])

vahlala · Answer 2 · 2017-06-27T13:47:20.107

0

I figured out another approach that doesn't use boolean indexing:

means = []
# Iterate over each column in x
for col in x.T:
    filtered_vals = col[~np.isnan(col)]
    avg = np.mean(filtered_vals)
    means.append(avg)

One line version:

means = [np.mean(col[~np.isnan(col)]) for col in x.T]

edited Jun 27 '17 at 13:47

answered Jun 27 '17 at 13:40

vahlala

355
3
14

Mean ignoring NaNs along columns in a NumPy array without using numpy.nanmean

2 Answers2