2

I'm looking to use Python to generate some sample data.

I'd like to build a function which takes an upper bound, lower bound, and size parameters. It would then return a list of the provided size that contains floats between the upper and lower bound that form a normal distribution.

def generate_normal_dist_samples(lower_bound, upper_bound, size):
    # Generate the data here

Can this be done using numpy.random.normal?

An example is to generate employee salary test data. If we know the lower_bound is 50K, and the upper_bound is 500K, how can I generate sample salaries that are between these two, but when summarized form a normal distribution?

Nick
  • 51
  • 1
  • 4
  • Correct answer is here: https://stackoverflow.com/questions/36894191/how-to-get-a-normal-distribution-within-a-range-in-numpy#answer-44308018 – rjurney Sep 18 '19 at 01:24

2 Answers2

2

The previous answers are correct, suggesting to use truncnorm, but since the question specifically asks about numpy.random.normal, I will naively answer it as such with this hackish approach.

Note that the problem is somewhat ill stated as it does not specify the standard deviation of the normal distribution.

def generate_normal_dist_samples(lower_bound, upper_bound, size, scale=None):
   loc = (lower_bound + upper_bound)/2
   if scale is None:
      scale = (upper_bound-lower_bound)/2
   results = []
   while len(results) < size:
     samples = numpy.random.normal(loc=loc, scale=scale, size=size-len(results))
     results += [sample for sample in samples if lower_bound <= sample <= upper_bound]
   return results
Techniquab
  • 843
  • 7
  • 22
1

You are looking for a truncated normal distribution, which you can find in the scipy package. https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.truncnorm.html

from scipy.stats import truncnorm
r = truncnorm.rvs(a, b, size=1000)

where a and b are the boundaries.

CodeZero
  • 1,649
  • 11
  • 18
  • 1
    without specifying the mean this will grab the tail of the sample between a and b which is unlikely to be what original post was looking for. – Techniquab May 16 '18 at 16:29
  • @Techniquab So what is the right solution? truncnorm doesn't take a mean. – rjurney Sep 15 '19 at 22:36
  • @rjurney I think this would come closest to what the original post was about r = truncnorm.rvs(-(a+b)/2, (a+b)/2, size=1000) +(a+b)/2 – Techniquab Sep 17 '19 at 14:10
  • A clean and correct answer is here: https://stackoverflow.com/questions/36894191/how-to-get-a-normal-distribution-within-a-range-in-numpy#answer-44308018 – rjurney Sep 18 '19 at 01:24