2

For a research project I am working on, I need to generate a set of random (or pseudo-random) data (say 10,000 datum) with the following parameters:

  • Maximum value = 35;
  • Minimum Value = 1.5;
  • Mean = 9.87;
  • Standard Deviation = 3.1;

Now clearly this distribution will look somewhat like that generated with

scipy.stats.maxwell.rvs(locs=1.5,scale=3.1)

However this does not give the necessary mean or max value. Is there an a possible solution to this?

Joe Buckley
  • 95
  • 1
  • 7
  • 2
    https://stackoverflow.com/questions/27831923/python-random-number-generator-with-mean-and-standard-deviation – Bayko May 31 '18 at 14:31
  • 2
    *"Now clearly this distribution will look somewhat like that generated with..."* Actually, that is not clear. The space of possible distributions that match those criteria is big. Really big. You just won't believe how vastly, hugely, mind-bogglingly big it is. – Warren Weckesser May 31 '18 at 16:04

1 Answers1

4

You need to choose a probability distribution according to your needs. There are a number of continuous distributions with bounded intervals. For example, you can pick the (scaled) beta distribution and compute the parameters α and β to fit your mean and standard deviation:

import numpy as np
import scipy.stats
import matplotlib.pyplot as plt

def my_distribution(min_val, max_val, mean, std):
    scale = max_val - min_val
    location = min_val
    # Mean and standard deviation of the unscaled beta distribution
    unscaled_mean = (mean - min_val) / scale
    unscaled_var = (std / scale) ** 2
    # Computation of alpha and beta can be derived from mean and variance formulas
    t = unscaled_mean / (1 - unscaled_mean)
    beta = ((t / unscaled_var) - (t * t) - (2 * t) - 1) / ((t * t * t) + (3 * t * t) + (3 * t) + 1)
    alpha = beta * t
    # Not all parameters may produce a valid distribution
    if alpha <= 0 or beta <= 0:
        raise ValueError('Cannot create distribution for the given parameters.')
    # Make scaled beta distribution with computed parameters
    return scipy.stats.beta(alpha, beta, scale=scale, loc=location)

np.random.seed(100)

min_val = 1.5
max_val = 35
mean = 9.87
std = 3.1
my_dist = my_distribution(min_val, max_val, mean, std)
# Plot distribution PDF
x = np.linspace(min_val, max_val, 100)
plt.plot(x, my_dist.pdf(x))
# Stats
print('mean:', my_dist.mean(), 'std:', my_dist.std())
# Get a large sample to check bounds
sample = my_dist.rvs(size=100000)
print('min:', sample.min(), 'max:', sample.max())

Output:

mean: 9.87 std: 3.100000000000001
min: 1.9290674232087306 max: 25.03903889816994

Probability density function plot:

Probability density function

Not every possible combination of bounds, mean and standard deviation will produce a valid distribution in this case, and the beta distribution has some particular properties that you may or may not desire. There are potentially infinite possible distributions that match some given requirements of bounds, mean and standard deviation with different qualities (skew, kurtosis, modality, ...). You need to decide what is the best distribution for your case.

jdehesa
  • 58,456
  • 7
  • 77
  • 121
  • @WarrenWeckesser Yeah I noted that. I don't know if the scaling causes additional imprecision or what... It definitely looks like more than floating point imprecision but it doesn't look big enough to me to be an altogether wrong computation... I'm reviewing how I compute `alpha` and `beta` but I cannot see what might be wrong... – jdehesa May 31 '18 at 16:35
  • By the way, you should probably move your answer over to https://stackoverflow.com/questions/27831923/python-random-number-generator-with-mean-and-standard-deviation, as this question is a duplicate of that one. – Warren Weckesser May 31 '18 at 16:44
  • @WarrenWeckesser Not sure where I went wrong the first time but it's fixed now... I'll copy the answer to the other question and delete it if this one gets closed... – jdehesa May 31 '18 at 17:14