48

I want to be able to pick values from a normal distribution that only ever fall between 0 and 1. In some cases I want to be able to basically just return a completely random distribution, and in other cases I want to return values that fall in the shape of a gaussian.

At the moment I am using the following function:

def blockedgauss(mu,sigma):
    while True:
        numb = random.gauss(mu,sigma)
        if (numb > 0 and numb < 1):
            break
    return numb

It picks a value from a normal distribution, then discards it if it falls outside of the range 0 to 1, but I feel like there must be a better way of doing this.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Catherine Georgia
  • 879
  • 3
  • 13
  • 17
  • 2
    If you "block" values < 0 and > 1, will it still be a gaussian distribution? – Hans Then Aug 26 '13 at 10:45
  • 1
    it will not be a gaussian distribution, but in some cases i don't want a gaussian distribution. i want to return a distribution which is tunable between being a random distribution (picking from a very wide gaussian), to something very close to a delta function (where the gaussian becomes very narrow) – Catherine Georgia Aug 26 '13 at 10:48

10 Answers10

67

It sounds like you want a truncated normal distribution. Using scipy, you could use scipy.stats.truncnorm to generate random variates from such a distribution:

import matplotlib.pyplot as plt
import scipy.stats as stats

lower, upper = 3.5, 6
mu, sigma = 5, 0.7
X = stats.truncnorm(
    (lower - mu) / sigma, (upper - mu) / sigma, loc=mu, scale=sigma)
N = stats.norm(loc=mu, scale=sigma)

fig, ax = plt.subplots(2, sharex=True)
ax[0].hist(X.rvs(10000), normed=True)
ax[1].hist(N.rvs(10000), normed=True)
plt.show()

enter image description here

The top figure shows the truncated normal distribution, the lower figure shows the normal distribution with the same mean mu and standard deviation sigma.

Gabriel
  • 40,504
  • 73
  • 230
  • 404
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • I use sigma = (upper-lower)/sigma_span as convenience. For sigma_span = 4 (+2s and -2s), 95% of values of a normal distribution would fall into the bounds. sigma_span=1 produces a very broad distribution, higher values (>6) will be almost indistinguishable from a normal distribution (most values will not reach the bounds). – squarespiral Jul 31 '19 at 13:32
  • Note that numpy.histogram normred har since been depricated. "density" now provides the equivalient [functionality.]( https://numpy.org/doc/stable/reference/generated/numpy.histogram.html) – FrostMarble May 02 '22 at 09:22
19

I came across this post while searching for a way to return a series of values sampled from a normal distribution truncated between zero and 1 (i.e. probabilities). To help anyone else who has the same problem, I just wanted to note that scipy.stats.truncnorm has the built-in capability ".rvs".

So, if you wanted 100,000 samples with a mean of 0.5 and standard deviation of 0.1:

import scipy.stats
lower = 0
upper = 1
mu = 0.5
sigma = 0.1
N = 100000

samples = scipy.stats.truncnorm.rvs(
          (lower-mu)/sigma,(upper-mu)/sigma,loc=mu,scale=sigma,size=N)

This gives a behavior very similar to numpy.random.normal, but within the bounds desired. Using the built-in will be substantially faster than looping to gather samples, especially for large values of N.

Greg Alushin
  • 191
  • 1
  • 2
6

I have made an example script by the following. It shows how to use the APIs to implement the functions we wanted, such as generate samples with known parameters, how to compute CDF, PDF, etc. I also attach an image to show this.

#load libraries   
import scipy.stats as stats

#lower, upper, mu, and sigma are four parameters
lower, upper = 0.5, 1
mu, sigma = 0.6, 0.1

#instantiate an object X using the above four parameters,
X = stats.truncnorm((lower - mu) / sigma, (upper - mu) / sigma, loc=mu, scale=sigma)

#generate 1000 sample data
samples = X.rvs(1000)

#compute the PDF of the sample data
pdf_probs = stats.truncnorm.pdf(samples, (lower-mu)/sigma, (upper-mu)/sigma, mu, sigma)

#compute the CDF of the sample data
cdf_probs = stas.truncnorm.cdf(samples, (lower-mu)/sigma, (upper-mu)/sigma, mu, sigma)

#make a histogram for the samples
plt.hist(samples, bins= 50,normed=True,alpha=0.3,label='histogram');

#plot the PDF curves 
plt.plot(samples[samples.argsort()],pdf_probs[samples.argsort()],linewidth=2.3,label='PDF curve')

#plot CDF curve        
plt.plot(samples[samples.argsort()],cdf_probs[samples.argsort()],linewidth=2.3,label='CDF curve')


#legend
plt.legend(loc='best')

enter image description here

FlyFish
  • 61
  • 1
  • 3
5

In case anybody wants a solution using numpy only, here is a simple implementation using a normal function and a clip (the MacGyver's approach):

    import numpy as np
    def truncated_normal(mean, stddev, minval, maxval):
        return np.clip(np.random.normal(mean, stddev), minval, maxval)

EDIT: do NOT use this!! this is how you shouldn't do it!! for instance,
a = truncated_normal(np.zeros(10000), 1, -10, 10)
may look like it works, but
b = truncated_normal(np.zeros(10000), 100, -1, 1)
will definitely not draw a truncated normal, as you can see in the following histogram:

enter image description here

Sorry for that, hope nobody got hurt! I guess the lesson is, don't try to emulate MacGyver at coding... Cheers,
Andres

fr_andres
  • 5,927
  • 2
  • 31
  • 44
  • 1
    who is MacGyver? – d56 Jun 04 '21 at 14:37
  • It's an audatious character that can build complex systems out of plain regular items to solve difficult situations. This answer is a bit of a meme, please indulge the MacGyver reference! – fr_andres Jun 04 '21 at 15:39
  • 3
    Shouldn't this be deleted? The warning is quite visible - but this still looks like sth that can at most mislead. – Lukas Wallrich Jun 27 '22 at 11:42
  • 2
    At most this can deter others from making the same mistake. My hope is that this is the case. Also the plot is more than quite visible – fr_andres Jun 27 '22 at 11:44
  • I would add the warning to the top, I almost missed it but didn't only because I knew it makes no sense. Not to be rude, but you can either hope others are responsible and reading all, or you can make sure your suggestion is not misleading :) Sometimes the reader is missing stuff. – Maayao May 25 '23 at 13:54
2

I have tested some solutions using numpy. Through trial and error method, I found out that ± variation divided by 3 is a good guess for standard deviation.

Following you have some examples:

The basics

import numpy as np
import matplotlib.pyplot as plt

val_min = 1000
val_max = 2000
variation = (val_max - val_min)/2
std_dev = variation/3
mean = (val_max + val_min)/2
dist_normal = np.random.normal(mean, std_dev,  1000)
print('Normal distribution\n\tMin: {0:.2f}, Max: {1:.2f}'
      .format(dist_normal.min(), dist_normal.max()))
plt.hist(dist_normal, bins=30)
plt.show()

Histogram 1

A comparative case

import numpy as np
import matplotlib.pyplot as plt

val_min = 1400
val_max = 2800
variation = (val_max - val_min)/2
std_dev = variation/3
mean = (val_max + val_min)/2
fig, ax = plt.subplots(3, 3)
plt.suptitle("Histogram examples by Davidson Lima (github.com/davidsonlima)", 
             fontweight='bold')
i = 0
j = 0
pos = 1
while (i < 3):
    while (j < 3):
        dist_normal = np.random.normal(mean, std_dev,  1000)
        max_min = 'Min: {0:.2f}, Max: {1:.2f}'.format(dist_normal.min(), dist_normal.max())
        ax[i, j].hist(dist_normal, bins=30, label='Dist' + str(pos))
        ax[i, j].set_title('Normal distribution ' + str(pos))
        ax[i, j].legend()
        ax[i, j].text(mean, 0, max_min, horizontalalignment='center', color='white',
                      bbox={'facecolor': 'red', 'alpha': 0.5})
        print('Normal distribution {0}\n\tMin: {1:.2f}, Max: {2:.2f}'
              .format(pos, dist_normal.min(), dist_normal.max()))
        j += 1
        pos += 1
    j = 0
    i += 1
plt.show()

Histogram 2

If someone has a better approach with numpy, please comment below.

Davidson Lima
  • 788
  • 9
  • 17
1

The parametrization of truncnorm is complicated, so here is a function that translates the parametrization to something more intuitive:

from scipy.stats import truncnorm

def get_truncated_normal(mean=0, sd=1, low=0, upp=10):
    return truncnorm(
        (low - mean) / sd, (upp - mean) / sd, loc=mean, scale=sd)


How to use it?

  1. Instance the generator with the parameters: mean, standard deviation, and truncation range:

     >>> X = get_truncated_normal(mean=8, sd=2, low=1, upp=10)
    
  2. Then, you can use X to generate a value:

     >>> X.rvs()
     6.0491227353928894
    
  3. Or, a numpy array with N generated values:

     >>> X.rvs(10)
     array([ 7.70231607,  6.7005871 ,  7.15203887,  6.06768994,  7.25153472,
             5.41384242,  7.75200702,  5.5725888 ,  7.38512757,  7.47567455])
    

A Visual Example

Here is the plot of three different truncated normal distributions:

X1 = get_truncated_normal(mean=2, sd=1, low=1, upp=10)
X2 = get_truncated_normal(mean=5.5, sd=1, low=1, upp=10)
X3 = get_truncated_normal(mean=8, sd=1, low=1, upp=10)

import matplotlib.pyplot as plt
fig, ax = plt.subplots(3, sharex=True)
ax[0].hist(X1.rvs(10000), normed=True)
ax[1].hist(X2.rvs(10000), normed=True)
ax[2].hist(X3.rvs(10000), normed=True)
plt.show()

enter image description here

toto_tico
  • 17,977
  • 9
  • 97
  • 116
0

actually, you can normalize the data, then transit it to the range you need. sorry for firstly use, i dont know how to show pictures directly the function is shown

0

I developed a simple function for creating a list of values in a range using numpy.random.normal and some extra code.

def truncnormal(meanv, sd, minv, maxv, n):
    finallist = []
    initiallist = []
    while len(finallist) < n:
        initiallist = list(np.random.normal(meanv, sd, n))
        initiallist.sort()
        indexmin = 0
        indexmax = 0
        for item in initiallist:
            if item < minv:
                indexmin = indexmin + 1
            else:
                break
        for item in initiallist[::-1]:
            if item > maxv:
                indexmax = indexmax + 1
            else:
                break
        indexmax = -indexmax
        finallist = finallist + initiallist[indexmin:indexmax]
    shuffle(finallist)
    finallist = finallist[:n] 
    print(len(finallist), min(finallist), max(finallist))

truncnormal(10, 3, 8, 11, 10000)
DiegoR
  • 61
  • 1
  • 4
0

Here's a simple function to do that:

def norm_range(s, e, n, nsd=3):
"""Returns normally distributed elements within a range.

Arguments:
s -- start value of range
e -- end vale of range
n -- number of elements required
nsd -- number of standard deviations within the range (default 3)
"""
m = (s + e)/2 #mean
sd = (e - s)/(nsd*2) #std dev
r = np.random.normal(m, sd, n) #generate required elements
r = r[(r>=s) & (r<=e)] #truncate oob elements
while len(r) < n:
    rex = np.random.normal(m, sd, 2*(n - len(r))) #generate extra elements
    r = np.append(r, rex[(rex>=s) & (rex<=e)]) #truncate oob and append
return np.random.choice(r, size=n, replace=False) #return n

It returns normally distributed n elements with mean at the center of the given range and by default covers 3 standard deviations.

mynk
  • 1,194
  • 2
  • 13
  • 16
0

If you don't want to use scipy's truncnorm, here is a simple NumPy function that redraws samples out of bounds:

def limited_normal(mu, sig, size, lo = -np.inf, hi = np.inf):
    A = np.random.normal(mu, sig, size)
    bad = np.where((A < lo) | (A > hi))
    n_bad = len(bad[0])
    if n_bad:
        A[bad] = limited_normal(mu, sig, n_bad, lo, hi)
    return A

print(limited_normal(1, 4, (4, 4), -2, -1))

mu and sig has to be scalars.

Björn Lindqvist
  • 19,221
  • 20
  • 87
  • 122