0

Is there any way to make this work? Where the array I'm working on consist of None, which means to ignore that value in the processing. For example, I would like to normalize this array:

output = np.array([[1,2,None,4,5],[None,7,8,9,10]])
mu = np.mean(output, axis=(0,1), keepdims=True)
sd = np.std(output, axis=(0,1), keepdims=True)
normalized_output = (output - mu)/sd

Expected outcome:

array([[-1.5666989 , -1.21854359, None, -0.52223297, -0.17407766],
       [ None,  0.52223297,  0.87038828,  1.21854359,  1.5666989 ]])

Edit: As suggested, it is better to use NaN instead of None. How to get this to work with NaN:

output = np.array([[1,2,np.NAN,4,5],[np.NAN,7,8,9,10]])
mu = np.mean(output, axis=(0,1), keepdims=True)
sd = np.std(output, axis=(0,1), keepdims=True)
normalized_output = (output - mu)/sd
print(normalized_output)
# array([[nan, nan, nan, nan, nan],
#        [nan, nan, nan, nan, nan]])
Jingles
  • 875
  • 1
  • 10
  • 31
  • If you have None in your vector, this is a very bad sign: it means the values in the array are of type `object` and so that all related computations are not optimized. Consider using NaN values that are native ones. – Jérôme Richard Apr 25 '21 at 15:00
  • Thanks for your input. I didnt know None is bad for vector. I can use `where` to change it to NaN. Updated the question with your suggestion. – Jingles Apr 25 '21 at 15:22
  • If you want to keep your values integers, use a masked array instead of NaN. I regard NaN as the result of a bad computation (0/0 for example), while a masked value indicates the absence of the value: two different things. NaN is often used for both, but that can lead to confusion. – 9769953 Apr 25 '21 at 15:31
  • NaNs are also taken into account when calculating, for example, a mean value. There are special `nanmean` functions, but here, I think a masked array is more appropriate. – 9769953 Apr 25 '21 at 15:32
  • Does this answer your question? [NumPy: calculate averages with NaNs removed](https://stackoverflow.com/questions/5480694/numpy-calculate-averages-with-nans-removed) – 9769953 Apr 25 '21 at 15:38

2 Answers2

1

You can do calculation that skip over certain values by using numpy masked arrays.

A function already exists to create a masked array that masks NaN values: ma.masked_invalid.

It can be used like so:

import numpy as np
from numpy import ma


output = ma.masked_invalid([[1,2,np.NAN,4,5],[np.NAN,7,8,9,10]])

mu = np.mean(output, axis=(0,1), keepdims=True)
sd = np.std(output, axis=(0,1), keepdims=True)
normalized_output = (output - mu)/sd
print(normalized_output)

Output (-- represents an invalid value):

[[-1.5461980716652028 -1.2206826881567392 -- -0.5696519211398116
  -0.24413653763134782]
 [-- 0.40689422938557973 0.7324096128940435 1.0579249964025073
  1.3834403799109711]]
Oli
  • 2,507
  • 1
  • 11
  • 23
0

You can use np.nanstd and np.nanmean function instead of np.std and np.mean

output = np.array([[1,2,np.nan,4,5],[np.nan,7,8,9,10]])
mu = np.nanmean(output, axis=(0,1), keepdims=True)
sd = np.nanstd(output, axis=(0,1), keepdims=True)
normalized_output = (output - mu)/sd

you will get output like this

array([[-1.54619807, -1.22068269,         nan, -0.56965192, -0.24413654],
      [        nan,  0.40689423,  0.73240961,  1.057925  ,  1.38344038]])

It is different from your desired output because np.nanstd ignore Nan values present in array so you have 8 elements instead of 10.

Zalak Bhalani
  • 979
  • 7
  • 15