I would like to generate random floating number including NaN
in a Data Frame with np.random.randn
Asked
Active
Viewed 2,199 times
1

boboo
- 105
- 12
-
1What would like to be the distribution of NaNs? Eg. NaN with probability p and uniform random with probability 1-p? – Jónás Balázs Dec 30 '18 at 22:37
-
I would like 2/7 NaNs. – boboo Dec 30 '18 at 22:42
-
I suggest to check this [question](https://stackoverflow.com/questions/34962104/pandas-how-can-i-use-the-apply-function-for-a-single-column) – Jónás Balázs Dec 30 '18 at 22:44
2 Answers
6
You can generate an array of random floats, then create a mask with np.choice
using p
to allow you to set a weight for the number of NaN
to include.
Something like:
import numpy as np
a = np.random.randn(20)
mask = np.random.choice([1, 0], a.shape, p=[.1, .9]).astype(bool)
a[mask] = np.nan
Result:
array([ 1.2769248 , 0.5949608 , -1.38006737, 0.3582266 , -1.852884 ,
0.81121663, -1.45830948, 0.03117856, 0.54509948, 1.22019729,
1.71643753, nan, -0.32470862, -0.77604474, 0.76698089,
-0.47863251, nan, -0.33308071, -0.32026717, 1.8493752 ])

Mark
- 90,562
- 7
- 108
- 148
0
If you are working on a DataFrame you can use apply
.
import numpy as np
import pandas as np
df = pd.DataFrame()
df['a'] = np.zeros(10) # or get data from somewhere else
p = 2/7
df.a.apply(lambda x: np.nan if np.random.rand() < p else np.random.rand())

Jónás Balázs
- 781
- 10
- 24