3

I have a dataframe of numbers such as:

A           B
2019-10-31  0.035333
2019-10-31  NaN
2019-11-30  -0.108532
2019-11-30  -0.030604
2019-11-30  NaN

I want to replace the NaN's in column B with a random gaussian number:

from random import seed
from random import gauss
# seed random number generator
seed(1)
# generate some Gaussian value
value = gauss(0, 0.1)

However, if I use the following code:

df.fillna(gauss(0, 0.1))

It fills all missing values with the same random value while I want a new random value for each NaN. How should I acchieve this?

t.pellegrom
  • 313
  • 3
  • 10
  • 1
    Does this answer your question? https://stackoverflow.com/questions/36227010/using-a-custom-function-series-in-fillna – Zoro May 05 '21 at 13:51

4 Answers4

1

You can generate the whole array with np.random then fill the nan's with loc:

mask = df['B'].isna()

to_fill = np.random.normal(0,0.1, size=mask.sum())
df.loc[mask, 'B'] = to_fill
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1
df.B.where(df.B.notna(), np.random.randn(len(df.index))*0.1 + 0)

Where the B column is not NaN, take from np.random.randn, else keep as is

to get

0    0.035333
1   -0.006504
2   -0.108532
3   -0.030604
4   -0.337191
Name: B, dtype: float64
Mustafa Aydın
  • 17,645
  • 4
  • 15
  • 38
1

Python pandas suggests replace function.

df.replace('NaN', gauss(1, 0.1))

Output:

            A         B
0  2019-10-31  0.035330
1  2019-10-31 -0.036289
2  2019-11-30 -0.108532
3  2019-11-30 -0.030604
4  2019-11-30 -0.036289
top talent
  • 615
  • 4
  • 17
1

Or if you just want to use gauss:

df['B'] = df['B'].fillna(df['B'].apply(lambda x: gauss(0,.1)))

Output:

            A         B
0  2019-10-31  0.035333
1  2019-10-31 -0.143683
2  2019-11-30 -0.108532
3  2019-11-30 -0.030604
4  2019-11-30  0.054647
Scott Boston
  • 147,308
  • 15
  • 139
  • 187