1

I am using Python 2.7.12 with Anaconda 4.2.0 (64-bit). Trying to generate random numbers with the random package. I need to generate n numbers within an interval and their mean should be a specified value. For example:

I want 14 randomly generated numbers between 42,000 and 91,000 and I want these numbers to have a mean of 60,000.

I know how to use random to produce integer numbers:

random.randint(42000, 91000)

and I can put this in a for loop, but, how can I adjust their mean as 60,000?

tcokyasar
  • 582
  • 1
  • 9
  • 26

2 Answers2

1

There are a lot of ways to generate random numbers between two values and a specific mean. Python has built in methods for some of these ways, but when I say there are a lot of ways, I mean it.

The simplest way would be to use a triangular distribution. random.triangular(low, high, mode) will produce a number from a distribution between low and high, with a specified mode. This might be good enough, but if you really want a mean, you can use the following function:

def triangular_mean(low, high, mean):
    mode = 3 * mean - low - high
    return random.triangular(low, high, mode)

If you wanted to get more complex, you could use a beta distribution by calling random.beta(alpha,beta). These are really flexible and very weird; this image from Wikipedia highlights how strange they can be.Beta distribution examples

The mean of a beta distribution is alpha/(alpha+beta), and the results are always between zero and one, so to scale it up to your use case, let's wrap it up in this function:

def beta_mean(low, high, mean, alpha):
    offset = low
    scale = high - low
    true_mean = (mean - offset)/scale
    beta = (alpha/true_mean) - alpha
    return offset + scale * random.beta(alpha, beta)

In the above function, alpha will change the shape of the distribution without changing the mean; it will change the median, the mode, the variance, and other properties of the distribution.

There are other distributions you could wrap in functions to fit your use case, but I'm going to guess that the above two will fit whatever use case you would like.

These will also produce floating point numbers, so if you want integers you'll have to cast them to integers, either by editing the functions or casting them explicitly after calling them.

Izaak Weiss
  • 1,281
  • 9
  • 18
  • 1
    I believe the OP is interested in fixing the _sample_ mean rather than the _population_ mean. – Mark Dickinson Jul 11 '17 at 16:09
  • Probably, I couldn't get the function. But, when I run it one single time, I suppose it should yield 60,000 since we expect the meant to be 60,000. However, it yields 50585.37639745854, some random number which is not really close to 60,000. – tcokyasar Jul 11 '17 at 16:17
1
while True:  # until a good sample was found
  s = [ random.randint(42000, 91000) for _ in range(13) ]
  v = 60000 + (60000 - (sum(s) / len(s))) * 13
  if 42000 <= v <= 91000:
    s.append(v)
    break
print sum(s) / len(s)  # will print 60000

This creates 13 random values from the standard generator and computes a 14th value so that the mean is exactly 60000. Since the 14th value might be not in the given range, it tries this again and again until a valid 14th value is possible.

This is neither elegant nor nice. But so was the question.

EDIT:

This approach will work for the given numbers but because it is retrying something, it might run indefinitely for different numbers (e. g. range = [42k, 91k], mean = 60k, count = 100k).

If you create 99999 elements randomly, their mean will be around (42k+91k)/2 and a single element used for balancing this back to 60k will not be enough (and thus always be outside of the range). You could use a slightly more complex random number generator, producing random numbers between 42k and 91k with a mean of 60k (ask another question about this if you don't know how!). Using this other random number generator will improve your chances of termination.

A second way to improve your chances of termination and thus finding a result is to build your result out of smaller chunks, each of them having the wished mean: Create 5000 chunks of 20 elements with the method I presented.

You can combine both methods of course.

Alfe
  • 56,346
  • 20
  • 107
  • 159
  • This is the best answer to a paradoxical question. @user8028576, you are asking to build a discrete uniform distribution with `a=42000` and `b=91000`. The problem is, mathematically the mean of this distribution should be _(a + b) / 2_ = 66,500, or the average of `a` and `b`, not 60,000. So you are describing your distribution as _random_ when it is not; it exhibits skewness to conform to that lower mean that you want. – Brad Solomon Jul 11 '17 at 16:17
  • Answers my question. Thanks @Alfe. – tcokyasar Jul 11 '17 at 16:17
  • The mean should not be 66,500 since there are not only two (min and max) numbers. There are 14 numbers. – tcokyasar Jul 11 '17 at 17:04
  • @BradSolomon OP didn't mention uniform distribution at all. Random numbers in a range [a, b] can still have a mean c ≠ mean(a, b). Consider random() * random(). That will be in range [0, 1] but have a mean of 0.25, and all numbers will be completely random. I considered coming up with a fancy formula, but since OP didn't want the *distribution* to have the specific mean but the *sample*, that wouldn't have helped anyway. – Alfe Jul 12 '17 at 09:44
  • I realized that your formulation is not very effective for a large number of `n`, e.g., 150. Any other suggestion? – tcokyasar Jul 12 '17 at 19:29
  • @user8028576 Please find my thoughts about larger numbers in my answer. – Alfe Jul 13 '17 at 09:56