I have a numpy array X that has nan
values in it.
X = np.array([[ 1., 2., 3.],
[ 4., nan, 54.],
[ 90., 32., nan],
[ 55., 42., 86.]])
I'd like to replace all nan values with a different random number. I can generate the random number easily with np.random.randn()
. I can use a mask to locate and count the nans.
mx = ma.masked_array(X,mask=np.isnan(X)) //locate nans
mx.mask.sum() // count nans so I know how many random values to generate
My issue is I don't know how to input them in quickly and efficiently. The example I gave above is a very small dataset, but I have one that is much much larger. Therefore efficiency is key.
If I try
X[mx.mask] = np.random.randn() //or
X[mx.mask]=np.random.randn(mx.mask.sum())
I replace each nan with the same random number which is not what I want or I get a broadcast error in the second example.
Any suggestions?