0

I have to generate random values from an interval for a machine learning task. I want to have a normal distribution within a range in numpy and I searched on web for it. I found this question How to get a normal distribution within a range in numpy? but I don't have any column for standart deviation.

The values are like this :

−21.8 ± 6.7
−4.3 ± 0.1
−7.4 ± 0.5

So I know minimum value and maximum value. But there is nothing about standart deviation.

Thanks.

Edit:

I want to generate 10 values from these values. The first value is the mean. Second value show the distance between max (and the min) value and mean. To be clear:

enter image description here

x = -21.8

The minimum value of the graph will be -21.8 - 6.7 = -29.5

The maximum value of the graph will be -21.8 + 6.7 = -15.1

desertnaut
  • 57,590
  • 26
  • 140
  • 166
kukuro
  • 75
  • 1
  • 10
  • 6
    The normal distribution is unbounded, so I'm not sure what you want - I'm guessing the second value you have _is_ the standard deviation. – miradulo May 14 '20 at 16:17
  • Thanks for your answer. -21.8 is the base value and I want to generate 10 values between -29.5 (-21.8 - 6.7) and -15.1 (-21.8+6.7) so there is an interval. These values have to be in form of gaussian. @miradulo – kukuro May 14 '20 at 16:44
  • 1
    I’m voting to close this question because it is not about programming and it is based upon a fundamental misunderstanding of a statistical concept – desertnaut May 14 '20 at 19:22
  • @desertnaut Thanks for your answer. It helps so much. – kukuro May 14 '20 at 19:30
  • It is just a comment generated automatically when voting for this specific reason. Happy it was helpful - you are very welcome. – desertnaut May 14 '20 at 19:31

3 Answers3

2

You can use scipy.stats.truncnorm to draw a number of samples from a random normal variable. However there is the need to specify the variable mean and standard deviation as for any normal random variable. I understand that you don't know the std, yet it impacts greatly how data are generated. Let's see few examples going from one extreme to another:

from scipy.stats import truncnorm
import seaborn as sns

m = -21.8
w = 6.7

for s in [0.5, 2, 7]:
   lower, upper = -w/s, w/s
   r = truncnorm(a=lower, b=upper, loc=m, scale=s)

   size = 1_000
   sample = r.rvs(size)
   sns.distplot(sample)

Which results in:

enter image description here

You can see that for s=7 the distribution is almost flat and uniform, on the other hand for s=0.5 it is extremely unlikely to be outside the range - you need to be 13-14 standard deviations from the mean.

FBruzzesi
  • 6,385
  • 3
  • 15
  • 37
  • s is not the standart deviation. It gives the max and min value. M+S and M-S are the min and max values. – kukuro May 14 '20 at 18:44
  • I understand that, yet to sample from a normal random variable you need to specify the mean and std. I will edit my answer to see how different std value will greatly impact what you want. – FBruzzesi May 14 '20 at 18:51
-1

You can get mu and std from your data this way (you must supply all data not only min and max):

from scipy.stats import norm

data=np.array([1,2,3,4])
mu, std = norm.fit(data)
FBruzzesi
  • 6,385
  • 3
  • 15
  • 37
jwzinserl
  • 427
  • 1
  • 3
  • 7
  • In every line, I've another data. So I only have -21.8 and 6.7 as a first line, I want to create a gaussian between -15.1 and -28.5. – kukuro May 14 '20 at 16:16
-1

from here

import numpy as np

mean = −21.8
std = 6.7
top = mean + std
bottom = mean - std

size = 5
a = np.random.normal(loc=mean, scale=std, size=size)

redraw_mask = ~(bottom < a < top)
while(0 < redraw_mask.size):
    a[redraw_mask] = np.random.normal(loc=mean, scale=std, size=redraw_mask.size)

out:

Gulzar
  • 23,452
  • 27
  • 113
  • 201