Taking np.average while ignoring NaN's?

Question

I have a matrix with shape (64,17) correspond to time & latitude. I want to take a weighted latitude average, which I know np.average can do because, unlike np.nanmean, which I used to average the longitudes, weights can be used in the arguments. However, np.average doesn't ignore NaN like np.nanmean does, so my first 5 entries of each row are included in the latitude averaging and make the entire time series full of NaN.

Is there a way I can take a weighted average without the NaN's being included in the calculation?

file = Dataset("sst_aso_1951-2014latlon_seasavgs.nc")
sst = file.variables['sst']
lat = file.variables['lat']

sst_filt = np.asarray(sst)
missing_values_indices = sst_filt < -8000000   #missing values have value -infinity
sst_filt[missing_values_indices] = np.nan      #all missing values set to NaN

weights = np.cos(np.deg2rad(lat))
sst_zonalavg = np.nanmean(sst_filt, axis=2)
print sst_zonalavg[0,:]
sst_ts = np.average(sst_zonalavg, axis=1, weights=weights)
print sst_ts[:]

Output:

[ nan nan nan nan nan
 27.08499908 27.33333397 28.1457119 28.32899857 28.34454346
 28.27285767 28.18571472 28.10199928 28.10812378 28.03411865
 28.06411552 28.16529465]

[ nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
 nan nan nan nan]

score 22 · Accepted Answer · edited Dec 06 '18 at 14:44

22

You can create a masked array like this:

data = np.array([[1,2,3], [4,5,np.NaN], [np.NaN,6,np.NaN], [0,0,0]])
masked_data = np.ma.masked_array(data, np.isnan(data))
# calculate your weighted average here instead
weights = [1, 1, 1]
average = np.ma.average(masked_data, axis=1, weights=weights)
# this gives you the result
result = average.filled(np.nan)
print(result)

This outputs:

[ 2.   4.5  6.   0. ]

edited Dec 06 '18 at 14:44

Gabriel

40,504
73
230
404

answered Mar 02 '16 at 21:22

Alex

21,273
10
61
73

I mentioned that I can't use np.nanmean because it doesn't take weights in its arguments. I'm trying to do a weighted average. – Cebbie Mar 02 '16 at 21:25
I have updated the answer to use a masked array and `np.mean` – Alex Mar 02 '16 at 21:30
I was about to edit a mention into the original post that since I'm doing a time series, removing the NaN from the data is also an option, but you beat me to it! – Cebbie Mar 02 '16 at 21:36
1

Edit: Actually, this still doesn't quite work. I still need to take a WEIGHTED average, which np.mean doesn't do. When I use np.average instead, it still outputs NaNs. – Cebbie Mar 02 '16 at 21:40
2

I have updated my answer, it should now be working, you need to use `np.ma.average` for masked arrays. Please note the `.ma`. – Alex Mar 02 '16 at 21:49

Divakar · Answer 2 · 2020-08-06T20:07:19.387

You can simply multiply the input array with the weights and sum along the specified axis ignoring NaNs with np.nansum. Thus, for your case, assuming the weights are to be used along axis = 1 on the input array sst_filt, the summations would be -

np.nansum(sst_filt*weights,axis=1)

Accounting for the NaNs while averaging, we will end up with :

def nanaverage(A,weights,axis):
    return np.nansum(A*weights,axis=axis)/((~np.isnan(A))*weights).sum(axis=axis)

Sample run -

In [200]: sst_filt  # 2D array case
Out[200]: 
array([[  0.,   1.],
       [ nan,   3.],
       [  4.,   5.]])

In [201]: weights
Out[201]: array([ 0.25,  0.75])

In [202]: nanaverage(sst_filt,weights=weights,axis=1)
Out[202]: array([0.75, 3.  , 4.75])

will your solution work if both arrays are 2D and both have some NaN's? — user308827, Dec 22 '21 at 00:33

deto · Answer 3 · 2016-03-02T22:01:21.577

I'd probably just select the portion of the array that isn't NaN and then use those indices to select the weights too.

For example:

import numpy as np
data = np.random.rand(10)
weights = np.random.rand(10)
data[[2, 4, 8]] = np.nan

print data
# [ 0.32849204,  0.90310062,         nan,  0.58580299,         nan,
#    0.934721  ,  0.44412978,  0.78804409,         nan,  0.24942098]

ii = ~np.isnan(data)
print ii
# [ True  True False  True False  True  True  True False  True]

result = np.average(data[ii], weights = weights[ii])
print result
# .6470319

Edit: I realized this won't work with two dimensional arrays. In that case, I'd probably just set the values and weights to zero for the NaNs. This yields the same result as if those indices were just not included in the calculation.

Before running np.average:

data[np.isnan(data)] = 0;
weights[np.isnan(data)] = 0;
result = np.average(data, weights=weights)

Or create copies if you want to keep track of which indices were NaN.

why does your original soln not work for 2D arrays? – user308827 Dec 22 '21 at 00:43 — user308827, Dec 22 '21 at 00:43

score 1 · Answer 4 · answered Apr 29 '19 at 16:26

@deto

The first line removes all the nan which will cause the second line to have incorrect results.

data[np.isnan(data)] = 0;
weights[np.isnan(data)] = 0;
result = np.average(data, weights=weights)

A copy should be taken before running the first line

data_copy = copy.deepcopy(data)
data[np.isnan(data_copy)] = 0;
weights[np.isnan(data_copy)] = 0;
result = np.average(data, weights=weights)

Taking np.average while ignoring NaN's?

4 Answers4

Linked