3

I have a numpy array X that has nan values in it.

X = np.array([[  1.,   2.,   3.],
              [  4.,  nan,  54.],
              [ 90.,  32.,  nan],
              [ 55.,  42.,  86.]])

I'd like to replace all nan values with a different random number. I can generate the random number easily with np.random.randn(). I can use a mask to locate and count the nans.

mx = ma.masked_array(X,mask=np.isnan(X)) //locate nans
mx.mask.sum()      // count nans so I know how many random values to generate

My issue is I don't know how to input them in quickly and efficiently. The example I gave above is a very small dataset, but I have one that is much much larger. Therefore efficiency is key.

If I try

X[mx.mask] = np.random.randn() //or 
X[mx.mask]=np.random.randn(mx.mask.sum())

I replace each nan with the same random number which is not what I want or I get a broadcast error in the second example.

Any suggestions?

Terence Chow
  • 10,755
  • 24
  • 78
  • 141
  • Does this help? http://stackoverflow.com/questions/7701429/efficient-evaluation-of-a-function-at-every-cell-of-a-numpy-array – grebneke Jan 28 '14 at 06:23

1 Answers1

6
X[np.isnan(X)] = np.random.randn(len(X[np.isnan(X)]))

The above works perfectly for me. Numpy version 1.8.0.

U2EF1
  • 12,907
  • 3
  • 35
  • 37