Perform numpy operation with None/NaN in array

Question

Is there any way to make this work? Where the array I'm working on consist of None, which means to ignore that value in the processing. For example, I would like to normalize this array:

output = np.array([[1,2,None,4,5],[None,7,8,9,10]])
mu = np.mean(output, axis=(0,1), keepdims=True)
sd = np.std(output, axis=(0,1), keepdims=True)
normalized_output = (output - mu)/sd

Expected outcome:

array([[-1.5666989 , -1.21854359, None, -0.52223297, -0.17407766],
       [ None,  0.52223297,  0.87038828,  1.21854359,  1.5666989 ]])

Edit: As suggested, it is better to use NaN instead of None. How to get this to work with NaN:

output = np.array([[1,2,np.NAN,4,5],[np.NAN,7,8,9,10]])
mu = np.mean(output, axis=(0,1), keepdims=True)
sd = np.std(output, axis=(0,1), keepdims=True)
normalized_output = (output - mu)/sd
print(normalized_output)
# array([[nan, nan, nan, nan, nan],
#        [nan, nan, nan, nan, nan]])

If you have None in your vector, this is a very bad sign: it means the values in the array are of type `object` and so that all related computations are not optimized. Consider using NaN values that are native ones. — Jérôme Richard, Apr 25 '21 at 15:00
Thanks for your input. I didnt know None is bad for vector. I can use `where` to change it to NaN. Updated the question with your suggestion. — Jingles, Apr 25 '21 at 15:22
If you want to keep your values integers, use a masked array instead of NaN. I regard NaN as the result of a bad computation (0/0 for example), while a masked value indicates the absence of the value: two different things. NaN is often used for both, but that can lead to confusion. — 9769953, Apr 25 '21 at 15:31
NaNs are also taken into account when calculating, for example, a mean value. There are special `nanmean` functions, but here, I think a masked array is more appropriate. — 9769953, Apr 25 '21 at 15:32
Does this answer your question? [NumPy: calculate averages with NaNs removed](https://stackoverflow.com/questions/5480694/numpy-calculate-averages-with-nans-removed) — 9769953, Apr 25 '21 at 15:38

score 1 · Answer 1 · answered Apr 25 '21 at 15:32

You can do calculation that skip over certain values by using numpy masked arrays.

A function already exists to create a masked array that masks NaN values: ma.masked_invalid.

It can be used like so:

import numpy as np
from numpy import ma


output = ma.masked_invalid([[1,2,np.NAN,4,5],[np.NAN,7,8,9,10]])

mu = np.mean(output, axis=(0,1), keepdims=True)
sd = np.std(output, axis=(0,1), keepdims=True)
normalized_output = (output - mu)/sd
print(normalized_output)

Output (-- represents an invalid value):

[[-1.5461980716652028 -1.2206826881567392 -- -0.5696519211398116
  -0.24413653763134782]
 [-- 0.40689422938557973 0.7324096128940435 1.0579249964025073
  1.3834403799109711]]

score 0 · Accepted Answer · answered Apr 25 '21 at 15:34

You can use np.nanstd and np.nanmean function instead of np.std and np.mean

output = np.array([[1,2,np.nan,4,5],[np.nan,7,8,9,10]])
mu = np.nanmean(output, axis=(0,1), keepdims=True)
sd = np.nanstd(output, axis=(0,1), keepdims=True)
normalized_output = (output - mu)/sd

you will get output like this

array([[-1.54619807, -1.22068269,         nan, -0.56965192, -0.24413654],
      [        nan,  0.40689423,  0.73240961,  1.057925  ,  1.38344038]])

It is different from your desired output because np.nanstd ignore Nan values present in array so you have 8 elements instead of 10.

Note that this changes the `dtype` of `output` from `int64` to `float64`. — 9769953, Apr 25 '21 at 15:37

Perform numpy operation with None/NaN in array

2 Answers2