0

I am trying to create a random set of 25 numbers, which are between 2 and 25, and sum up to 100 in python.

This Question gives an answer, but it seems that the maximum number never ends up being close to 25.

I've tried creating a list, dividing each number, and recreating the list, but it essentially nullifies my min and max values since they end up getting divided by a number larger than 1 almost all of the time:

numbers = np.random.randint(low = 2, high = 25, size = 100, dtype = int)
scale = 100 / sum(numbers) #We want weights to add up to 100%

#Scale values
for value in numbers:
    nums.append(value * scale)

Is there any way to do this? Thanks

  • 1
    Do you want 25 integers, or 25 numbers? In either case, if you have 25 numbers which add up to 100, the mean of the numbers is 4, and you would expect that the frequency of occurrence would drop off rapidly as you get farther away from the mean, so the probability of getting a value greater than 20 is pretty small. – rici Nov 22 '22 at 05:24
  • 1
    Another important question: What kind of distribution of the "random sets" do you want? I need to expand on this a bit at some point in my answer, but the bottom line is that my solution and Severin's solution produce very different distributions. My code produces a random list uniformly selected from the universe of possible lists. Severin's produces a multinomial distribution (over the individual values, not the sets), which concentrates the values a lot more around the mean. Julien's solution is almost a multinomial but it suffers from a very small bias because it doesn't use rejection. – rici Nov 22 '22 at 15:51
  • I haven't learned too much about stats, but it seems that some kind of random distribution works here (i.e I don't want the numbers to fit in to any specific pattern). Since I am generating so many of these sets, the probability that a given distribution (or something close to it) will be covered is high. – organicman276 Nov 22 '22 at 19:46
  • I'm not talking about the distribution of the numbers within your lists (although that also varies between the proposed solutions); I'm talking about the lists themselves. There are a lot of possible lists, and you're not going to come close to producing all of them. But the way you produce the lists does alter their composition (for example, the frequency of larger elements). – rici Nov 23 '22 at 00:13

3 Answers3

0

You haven't specified what probability distribution the numbers should have so this could be an easy valid way although very unlikely to yield numbers close to 25:

import numpy as np 
numbers = np.full(25,2)
while numbers.sum() < 100:
    i = np.random.randint(25)
    if numbers[i] < 25: # almost guaranteed...
        numbers[i] += 1
Julien
  • 13,986
  • 5
  • 29
  • 53
0

Suppose you want a random list of 25 numbers (not necessarily integers) which add up to 100, with the constraint that each number is at least 2 and no more than 25.

First, note that we can change that into an equivalent problem where the numbers are only required to be non-negative, by generating 25 numbers between 0 and 23 which add up to 100-25*2, which is 50. Once we have that list, we just add two to each number; the new list will be between 2 and 25, and its sum will be 100 (because we added 2 to each of 25 numbers).

The second thing to note is that the probability of finding a number close to 25 in that list is pretty small, since that would require one number to have attracted almost half of the available total. (That claim is clearer if you look at the alternative formulation, 25 numbers between 0 and 23 which add up to 50. If one of those numbers is, say, 20, then the other 24 numbers add up to 30, which means that you have a distribution which looks more like the distribution of wealth in an unregulated market than a uniform distribution.)

Since we're going to generate a uniform sample, we can handle the maximum value by ignoring it until we generate the random list, and then checking to see if the incredibly unlikely biased sample showed up; if it did, we just toss out the result and try again. (That's called "rejection sampling", and it's a pretty common technique, which works adequately even if half the samples are rejected. The advantage of rejection sampling is that it does not introduce bias.)

So let's get back to the question of how to generate a uniformly distributed list of non-negative numbers with a given sum. As long as the numbers are from a very large universe of possible values (like all double-precision floating point numbers within the range), that's quite easy. Say we need N numbers which add up to k. We start by randomly generating N-1 numbers, each in the range (0, k). Then we sort that set of numbers, and put 0 at one end of the sorted list and k at the other end. Finally, we compute the adjacent differences between successive elements. That gives us a list of N numbers which add up to k, and it turns out that the random lists so generated are an almost uniform sample of the possibilities. (There is a tiny bias introduced by the fact that it is possible for the same random number to have been generated twice, leading to a zero in the final list of differences. The zero is not a problem; the problem is that the zero shows up slightly too often. But the probability of getting an exact zero is less than one in a hundred million, and it's only exact zeros whose frequency is biased.)

In summary:

from random import uniform
def gen_random_list(N=25, k=100, min=2, max=25):
    assert(N * min <= k <= N * max)
    adjusted_k = k - min * N
    while True:
        endpoints = sorted(uniform(0, adjusted_k) for i in range(N - 1))
        values = [*(end - begin + min
                    for begin, end in zip([0] + endpoints,
                                          endpoints + [adjusted_k]))]
        if all(v <= max for v in values):
            return values

OK, what if we need a list of integers? In that case, it's much more likely that the above procedure will produce a zero, and the bias will be noticeable. To avoid that, we make two changes:

  • Instead of adjusting the range so that the minimum is 0, we adjust it so that the minimum is 1. (Which works for integers, because there are no integers between 0 and 1.) Now, the adjusted sum will be k' = k - N * (min - 1).
  • Second, instead of generating N - 1 independent random values, we generate a random selection of N - 1 different values from the half-open range [1, k') (using random.sample) Other than that, the algorithm is the same. Sort the generated list, compute adjacent differences, and verify that the maximum wasn't exceeded:
from random import sample
def gen_random_list_of_ints(N=25, k=100, min=2, max=25):
    assert(N * min <= k <= N * max)
    adjusted_k = k - (min - 1) * N
    while True:
        endpoints = sorted(sample(range(1, adjusted_k), N - 1))
        values = [*(end - begin + min - 1
                    for begin, end in zip([0] + endpoints,
                                          endpoints + [adjusted_k]))]
        if all(v <= max for v in values):
            return values
rici
  • 234,347
  • 28
  • 237
  • 341
  • DUDE!!!! This is amazing. I'm generating right around 1 million random lists and using to weight a portfolio of stocks, so the fact that numbers close to 25 don't show up often is fine, though your method seems to have them show up more than others I've tried. Seriously, I appreciate this so much. Thank you for taking the time out of your day – organicman276 Nov 22 '22 at 09:17
0

Simplest way to do that for integers is to use Multinomial distribution, which has nice property to sum up to desired number. First, take out minimum value to get range [0...s], and then just use multinomial and reject sample exceeding max value. You could play with probabilities array p to get desired behavior.

As already noted, mean value would be 4.

Code, Python 3.10, Windows x64

import numpy as np

N = 25

minv = 2
maxv = 25
summ = 100

def sampling(N, minv, maxv, summ):

    summa = summ - N*minv # fix range to [0...]
    p = np.full(N, 1.0/N) # probabilities, uniform

    while True:
        t = np.random.multinomial(summa, p, size=1) + minv # back to desired range
        if np.any(t > maxv): # check
            continue # and reject
        return t

q = sampling(N, minv, maxv, summ)

print(np.sum(q))

UPDATE

Mean value of Xi for multinomial is E(Xi)=n pi. In your case n=(100-25⎈2)=50. pi=1/25, so E(Xi)=50/25=2, and you have to add back 2, so what you see as mean would be 4.

But! You could change pi such that it is not equiprobable anymore. E.g. 5⎈[0.1] + 20⎈[0.5/20] will produce first five rv with mean 50⎈0.1+2=7 and last 20 with mean 50⎈0.5/20+2=1.25+2=3.25

Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64
  • yes, I am now realizing that you can't really escape the distribution being skewed due to the mean, should have realized this. Thank you for the response bro, have a good day! – organicman276 Nov 22 '22 at 19:41
  • @organicman276 That's the beauty of the multinomial distribution - you could make things different by manipulating probabilities. See update for details – Severin Pappadeux Nov 22 '22 at 21:18