1

I would like to generate random floating number including NaN in a Data Frame with np.random.randn

boboo
  • 105
  • 12

2 Answers2

6

You can generate an array of random floats, then create a mask with np.choice using p to allow you to set a weight for the number of NaN to include.

Something like:

import numpy as np
a = np.random.randn(20)
mask = np.random.choice([1, 0], a.shape, p=[.1, .9]).astype(bool)
a[mask] = np.nan

Result:

array([ 1.2769248 ,  0.5949608 , -1.38006737,  0.3582266 , -1.852884  ,
        0.81121663, -1.45830948,  0.03117856,  0.54509948,  1.22019729,
        1.71643753,         nan, -0.32470862, -0.77604474,  0.76698089,
       -0.47863251,         nan, -0.33308071, -0.32026717,  1.8493752 ])
Mark
  • 90,562
  • 7
  • 108
  • 148
0

If you are working on a DataFrame you can use apply.

import numpy as np
import pandas as np

df = pd.DataFrame()
df['a'] = np.zeros(10) # or get data from somewhere else
p = 2/7
df.a.apply(lambda x: np.nan if np.random.rand() < p else np.random.rand())
Jónás Balázs
  • 781
  • 10
  • 24