13

For testing purposes, I'd like to create a M by N numpy array with c randomly placed NaNs

import numpy as np

M = 10;
N = 5;
c = 15;
A = np.random.randn(M,N)

A[mask] = np.nan

I am having problems in creating a mask with c true elements, or maybe this can be done with indices directly?

Oleg
  • 10,406
  • 3
  • 29
  • 57

2 Answers2

18

You can use np.random.choice with the optional replace=False for random selection without replacement and use those on a flattened version of A (done with .ravel()), like so -

A.ravel()[np.random.choice(A.size, c, replace=False)] = np.nan

Sample run -

In [100]: A
Out[100]: 
array([[-0.35365726,  0.26754527, -0.44985524, -1.29520237,  2.01505444],
       [ 0.01319146,  0.65150356, -2.32054478,  0.40924753,  0.24761671],
       [ 0.3014714 , -0.80688589, -2.61431163,  0.07787956,  1.23381951],
       [-1.70725777,  0.07856845, -1.04354202, -0.68904925,  1.07161002],
       [-1.08061614,  1.17728247, -1.5913516 , -1.87601976,  1.14655867],
       [ 1.12542853, -0.26290025, -1.0371326 ,  0.53019033, -1.20766258],
       [ 1.00692277,  0.171661  , -0.89646634,  1.87619114, -1.04900026],
       [ 0.22238353, -0.6523747 , -0.38951426,  0.78449948, -1.14698869],
       [ 0.58023183,  1.99987331, -0.85938155,  1.4211672 , -0.43369898],
       [-2.15682219, -0.6872121 , -1.28073816, -0.97523148, -2.27967001]])

In [101]: A.ravel()[np.random.choice(A.size, c, replace=False)] = np.nan

In [102]: A
Out[102]: 
array([[        nan,  0.26754527, -0.44985524,         nan,  2.01505444],
       [ 0.01319146,  0.65150356, -2.32054478,         nan,  0.24761671],
       [        nan, -0.80688589,         nan,         nan,  1.23381951],
       [        nan,         nan, -1.04354202, -0.68904925,  1.07161002],
       [-1.08061614,  1.17728247, -1.5913516 ,         nan,  1.14655867],
       [ 1.12542853,         nan, -1.0371326 ,  0.53019033, -1.20766258],
       [        nan,  0.171661  , -0.89646634,         nan,         nan],
       [ 0.22238353, -0.6523747 , -0.38951426,  0.78449948, -1.14698869],
       [ 0.58023183,  1.99987331, -0.85938155,         nan, -0.43369898],
       [-2.15682219, -0.6872121 , -1.28073816, -0.97523148,         nan]])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Oh, that's a bit more elegant than my way! – tmdavison Aug 24 '15 at 12:46
  • I guess I can also replace `np.random.choice` with `np.random.randint(0,high=A.size,size=c)` for my application (if replacement does not really matter). However, why the array does not stay flat after `ravel()`? – Oleg Aug 24 '15 at 12:53
  • @OlegKomarov `np.random.randint` might give you repeated indices, so I don't think that would work in your case. Regarding the `.ravel()` thing, it's a [`view`](http://stackoverflow.com/questions/4370745/view-onto-a-numpy-array) only, so it's not exactly flattening in memory. So, the "flattened view" is indexed and set as NaNs, while being kept as a 2D array. – Divakar Aug 24 '15 at 12:57
  • Thanks, I was reading the docs in the meantime :). As a final curiosity, the docs for `ravel()` say `A copy is made only if needed.`. Can it happen that I get a flattened `A`? – Oleg Aug 24 '15 at 13:03
  • 1
    @OlegKomarov If you are just indexing it, it must stay as a 2D array. You can also use [`np.put`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.put.html) for the same effect. So, the solution with it would be `np.put(A,np.random.choice(A.size, c, replace=False),np.nan)`. – Divakar Aug 24 '15 at 13:14
9

You could use np.random.shuffle on a new array to create your mask:

import numpy as np

M = 10;
N = 5;
c = 15;
A = np.random.randn(M,N)

mask=np.zeros(M*N,dtype=bool)
mask[:c] = True
np.random.shuffle(mask)
mask=mask.reshape(M,N)

A[mask] = np.nan

Which gives:

[[ 0.98244168  0.72121195  0.99291217  0.17035834  0.46987918]
 [ 0.76919975  0.53102064         nan  0.78776918         nan]
 [ 0.50931304  0.91826809  0.52717345         nan         nan]
 [ 0.35445471  0.28048106  0.91922292  0.76091783  0.43256409]
 [ 0.69981284  0.0620876   0.92502572         nan         nan]
 [        nan         nan         nan  0.24466688  0.70259211]
 [ 0.4916004          nan         nan  0.94945378  0.73983538]
 [ 0.89057404  0.4542628          nan  0.95547377         nan]
 [ 0.4071912   0.36066797  0.73169132  0.48217226  0.62607888]
 [ 0.30341337         nan  0.75608859  0.31497997         nan]]
tmdavison
  • 64,360
  • 12
  • 187
  • 165
  • 1
    Not bad either yours either! I had to google search for random selection without replacement and found that `random_choice` had that optional `replace` argument, just worked! :) – Divakar Aug 24 '15 at 12:58