How to generate a random normal distribution of integers

Question

How to generate a random integer as with np.random.randint(), but with a normal distribution around 0.

np.random.randint(-10, 10) returns integers with a discrete uniform distribution np.random.normal(0, 0.1, 1) returns floats with a normal distribution

What I want is a kind of combination between the two functions.

The normal distribution is continuous by definition, so the answer to this question depends on how you want to discretise it. One possible solution is to sample from `np.random.normal` and round the result to an integer. — Will Vousden, May 24 '16 at 11:07

score 43 · Accepted Answer · edited Feb 06 '21 at 09:25

43

One other way to get a discrete distribution that looks like the normal distribution is to draw from a multinomial distribution where the probabilities are calculated from a normal distribution.

import scipy.stats as ss
import numpy as np
import matplotlib.pyplot as plt

x = np.arange(-10, 11)
xU, xL = x + 0.5, x - 0.5 
prob = ss.norm.cdf(xU, scale = 3) - ss.norm.cdf(xL, scale = 3)
prob = prob / prob.sum() # normalize the probabilities so their sum is 1
nums = np.random.choice(x, size = 10000, p = prob)
plt.hist(nums, bins = len(x))

Here, np.random.choice picks an integer from [-10, 10]. The probability for selecting an element, say 0, is calculated by p(-0.5 < x < 0.5) where x is a normal random variable with mean zero and standard deviation 3. I chose a std. dev. of 3 because this way p(-10 < x < 10) is almost 1.

The result looks like this:

edited Feb 06 '21 at 09:25

Moot

2,195
2
17
14

answered May 24 '16 at 11:48

ayhan

70,170
20
182
203

Why did you add and subtract 0.5 in `xL` and `xU`? – Kenenbek Arzymatov Sep 15 '16 at 14:07
4

For a continuous distribution, P(x=0)=0 (this is true for any other number). The probabiltiy is defined for intervals. Here, in order to associate a probability to 0 (and to other integers) I used the interval (-0.5, 0.5) It was basically because the question asked for integers. For 1, it is (0.5, 1.5). – ayhan Sep 15 '16 at 14:13
1

Why didn't you take `ss.norm.pdf(x, scale=3)`? – Kenenbek Arzymatov Sep 15 '16 at 14:22
3

It would work fine with these parameters but with a smaller standard deviation you would end up pdf's greater than 1, for example. It would be fine as long as you do the same normalization (dividing by the sum) afterwards but I didn't want to confuse potential readers since pdf is actually not probability (that's why it can be greater than 1) so I wanted to use actual probabilities. – ayhan Sep 15 '16 at 14:30

score 16 · Answer 2 · edited May 23 '17 at 11:54

16

It may be possible to generate a similar distribution from a Truncated Normal Distribution that is rounded up to integers. Here's an example with scipy's truncnorm().

import numpy as np
from scipy.stats import truncnorm
import matplotlib.pyplot as plt

scale = 3.
range = 10
size = 100000

X = truncnorm(a=-range/scale, b=+range/scale, scale=scale).rvs(size=size)
X = X.round().astype(int)

Let's see what it looks like

bins = 2 * range + 1
plt.hist(X, bins)

edited May 23 '17 at 11:54

Community

1
1

answered May 24 '16 at 11:03

bakkal

54,350
12
131
107

1

I appreciate the responses from both @ayhan and bakkal. Please, I only ask this for my knowledge; I do not wish to insult either answer. Just looking at the plot, Bakkal's is more symmetric. They both seem sufficient and from the code seem equally valid. But my understanding is weak. Is there a objectively better method? – Robert Lugg Nov 17 '17 at 18:33
2

@RobertLugg the relatively higher symmetry may be due to sample size being larger. That said, I think the code in this answer is simpler. – bakkal Nov 17 '17 at 18:38
2

Be aware that you overwrite the python range function with this code. Try to use neutral variable names. – Rriskit Nov 20 '19 at 19:41
While the code is simpler, it is also slower. In my test, about 100 times slower than ayhan's solution. Still, it takes 3 seconds to generate 10.000 numbers 50 times. So in many cases this will be ok. – Pablo May 29 '20 at 14:47
This answer is slightly wrong, as the first and last buckets will have lower probabilities than other buckets. This is because `X.round()` will do natural rounding with `0.5` as the cutoff-point, while all generated real values are in [a,b). To fix this, use `np.floor(X).astype(int)` instead, which will yield integer values in [a,b) with equal probabilities. – bricklore Feb 23 '23 at 12:20

score 6 · Answer 3 · edited Feb 27 '19 at 02:48

6

The accepted answer here works, but I tried Will Vousden's solution and it works well too:

import numpy as np

# Generate Distribution:
randomNums = np.random.normal(scale=3, size=100000)
randomInts = np.round(randomNums)

# Plot:
axis = np.arange(start=min(randomInts), stop = max(randomInts) + 1)
plt.hist(randomInts, bins = axis)

edited Feb 27 '19 at 02:48

vs97

5,765
3
28
41

answered Feb 27 '19 at 02:05

stephan

356
3
9

6

Instead of generating `randomNums` and rounding them to "ints" (actually, reals ending in `.0`), what about `randomInts = np.random.normal(loc=10, scale=3, size=10000).astype(int)-10`, which returns actual ints? **Note**, however, that it's necessary to generate values with a `loc` other than `0` (and return it to `0` by subtracting the loc), or you'll end up with _far too many_ results at exactly `0`. – Dave Land Aug 21 '19 at 18:32

Warren Weckesser · Answer 4 · 2022-02-18T13:27:43.087

Old question, new answer:

For a bell-shaped distribution on the integers {-10, -9, ..., 9, 10}, you can use the binomial distribution with n=20 and p=0.5, and subtract 10 from the samples.

For example,

In [167]: import numpy as np

In [168]: import matplotlib.pyplot as plt

In [169]: rng = np.random.default_rng()

In [170]: N = 5000000   # Number of samples to generate

In [171]: samples = rng.binomial(n=20, p=0.5, size=N) - 10

In [172]: samples.min(), samples.max()
Out[172]: (-10, 10)

Note that the probability of -10 or 10 is pretty low, so you won't necessarily see them in any given sample, especially if you use a smaller N.

np.bincount() is an efficient way to generate a histogram for a collection of small nonnegative integers:

In [173]: counts = np.bincount(samples + 10, minlength=20)

In [174]: counts
Out[174]: 
array([     4,    104,    889,   5517,  22861,  73805, 184473, 369441,
       599945, 800265, 881140, 801904, 600813, 370368, 185082,  73635,
        23325,   5399,    931,     95,      4])

In [175]: plt.bar(np.arange(-10, 11), counts)
Out[175]: <BarContainer object of 21 artists>

I couldn't find `rng` imported, but it must be this: `rng = np.random.default_rng()`, which presumably stands for a "default random number generator". https://devdocs.io/numpy~1.20/reference/random/index#random-quick-start — Stefan_EOX, Feb 18 '22 at 11:17
Thanks for pointing that out, @Stefan_EOX. I updated the code to include the creation of `rng`. — Warren Weckesser, Feb 18 '22 at 13:29

score 1 · Answer 5 · answered Oct 27 '21 at 11:43

This version is mathematically not correct (because you crop the bell) but will do the job quick and easily understandable if preciseness is not needed that much:

def draw_random_normal_int(low:int, high:int):

    # generate a random normal number (float)
    normal = np.random.normal(loc=0, scale=1, size=1)

    # clip to -3, 3 (where the bell with mean 0 and std 1 is very close to zero
    normal = -3 if normal < -3 else normal
    normal = 3 if normal > 3 else normal

    # scale range of 6 (-3..3) to range of low-high
    scaling_factor = (high-low) / 6
    normal_scaled = normal * scaling_factor

    # center around mean of range of low high
    normal_scaled += low + (high-low)/2

    # then round and return
    return np.round(normal_scaled)

Drawing 100000 numbers results in this histogramm:

CopyPasteIt · Answer 6 · 2018-04-02T14:31:49.973

Here we start by getting values from the bell curve.

CODE:

#--------*---------*---------*---------*---------*---------*---------*---------*
# Desc: Discretize a normal distribution centered at 0
#--------*---------*---------*---------*---------*---------*---------*---------*

import sys
import random
from math import sqrt, pi
import numpy as np
import matplotlib.pyplot as plt

def gaussian(x, var):
    k1 = np.power(x, 2)
    k2 = -k1/(2*var)
    return (1./(sqrt(2. * pi * var))) * np.exp(k2)

#--------*---------*---------*---------*---------*---------*---------*---------#
while 1:#                          M A I N L I N E                             #
#--------*---------*---------*---------*---------*---------*---------*---------#
#                                  # probability density function
#                                  #   for discrete normal RV
    pdf_DGV = []
    pdf_DGW = []    
    var = 9
    tot = 0    
#                                  # create 'rough' gaussian
    for i in range(-var - 1, var + 2):
        if i ==  -var - 1:
            r_pdf = + gaussian(i, 9) + gaussian(i - 1, 9) + gaussian(i - 2, 9)
        elif i == var + 1:
            r_pdf = + gaussian(i, 9) + gaussian(i + 1, 9) + gaussian(i + 2, 9)
        else:
            r_pdf = gaussian(i, 9)
        tot = tot + r_pdf
        pdf_DGV.append(i)
        pdf_DGW.append(r_pdf)
        print(i, r_pdf)
#                                  # amusing how close tot is to 1!
    print('\nRough total = ', tot)
#                                  # no need to normalize with Python 3.6,
#                                  #   but can't help ourselves
    for i in range(0,len(pdf_DGW)):
        pdf_DGW[i] = pdf_DGW[i]/tot
#                                  # print out pdf weights
#                                  #   for out discrte gaussian
    print('\npdf:\n')
    print(pdf_DGW)

#                                  # plot random variable action
    rv_samples = random.choices(pdf_DGV, pdf_DGW, k=10000)
    plt.hist(rv_samples, bins = 100)
    plt.show()
    sys.exit()

OUTPUT:

-10 0.0007187932912256041
-9 0.001477282803979336
-8 0.003798662007932481
-7 0.008740629697903166
-6 0.017996988837729353
-5 0.03315904626424957
-4 0.05467002489199788
-3 0.0806569081730478
-2 0.10648266850745075
-1 0.12579440923099774
0 0.1329807601338109
1 0.12579440923099774
2 0.10648266850745075
3 0.0806569081730478
4 0.05467002489199788
5 0.03315904626424957
6 0.017996988837729353
7 0.008740629697903166
8 0.003798662007932481
9 0.001477282803979336
10 0.0007187932912256041

Rough total =  0.9999715875468381

pdf:

[0.000718813714486599, 0.0014773247784004072, 0.003798769940305483, 0.008740878047691289, 0.017997500190860556, 0.033159988420867426, 0.05467157824565407, 0.08065919989878699, 0.10648569402724471, 0.12579798346031068, 0.13298453855078374, 0.12579798346031068, 0.10648569402724471, 0.08065919989878699, 0.05467157824565407, 0.033159988420867426, 0.017997500190860556, 0.008740878047691289, 0.003798769940305483, 0.0014773247784004072, 0.000718813714486599]

JeeyCi · Answer 7 · 2022-03-18T13:49:11.863

I'm not sure if there (in scipy generator) is an option of var-type choice to be generated, but common generation can be such with scipy.stats

# Generate pseudodata  from a single normal distribution
import scipy
from scipy import stats
import numpy as np
import matplotlib.pyplot as plt

dist_mean = 0.0
dist_std = 0.5
n_events = 500

toy_data = scipy.stats.norm.rvs(dist_mean, dist_std, size=n_events)
toy_data2 = [[i, j] for i, j in enumerate(toy_data )]

arr = np.array(toy_data2)
print("sample:\n", arr[1:500, 0])
print("bin:\n",arr[1:500, 1])
plt.scatter(arr[1:501, 1], arr[1:501, 0])
plt.xlabel("bin")
plt.ylabel("sample")
plt.show()

or in such a way (also option of dtype choice is absent):

import matplotlib.pyplot as plt

mu, sigma = 0, 0.1 # mean and standard deviation
s = np.random.normal(mu, sigma, 500)

count, bins, ignored = plt.hist(s, 30, density=True)
plt.show()
print(bins)     # <<<<<<<<<<

plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) * np.exp( - (bins - mu)**2 / (2 * sigma**2) ),
          linewidth=2, color='r')
plt.show()

without visualization the most common way (also no possibility to point out var-type)

bins = np.random.normal(3, 2.5, size=(10, 1))

a wrapper class could be done to instantiate the container with a given vars-dtype (e.g. by rounding floats to integers, as mentioned above)...

How to generate a random normal distribution of integers

7 Answers7

Linked

Related