5

This question has been asked before, but I've never really seen a good answer.

  1. I want to generate 8 random numbers that sum to 0.5.

  2. I want each number to be randomly chosen from a uniform distribution (ie, the simple function below will not work because the numbers will not be uniformly distributed).

    def rand_constrained(n,tot):
        r = [random.random() for i in range(n)]  
        s = sum(r)
        r = [(i/s*tot) for i in r] 
        return r
    

The code should be generalizable, so that you can generate N uniform random numbers that sum to M (where M is a positive float). If possible, can you please also explain (or show with a plot) why your solution generates random numbers uniformly in the appropriate range?

Related questions that miss the mark:

Generate multiple random numbers to equal a value in python (current accepted answer isn't uniform--another answer which is uniform only works with integers)

Getting N random numbers that the sum is M (same question in Java, currently accepted answer is just plain wrong, also no answers with uniform distribution)

Generate N random integers that sum to M in R (same question, but in R with a normal--not uniform--distribution)

Any help is greatly appreciated.

Community
  • 1
  • 1
spacetyper
  • 1,471
  • 18
  • 27
  • you may want to toss this one to the math stackexchange folks. – user1269942 Jun 05 '15 at 05:30
  • This question suffers the usual problem of not clearly specifying the joint distribution required. Why are you even sure the properties you want can hold simultaneously? – user2357112 Jun 05 '15 at 05:45
  • 1
    http://stats.stackexchange.com/questions/14059/generate-uniformly-distributed-weights-that-sum-to-unity – Émilien Tlapale Jun 05 '15 at 05:50
  • @Xīcò: While that question is quite relevant (and much more rigorously posed than this one), the properties required of the distribution in that question are not the same as the properties required here. In particular, while the overall vector is uniformly distributed over all possible vectors, none of the components are uniformly distributed. – user2357112 Jun 05 '15 at 05:54
  • @user2357112 I'm _not_ sure that the properties I want can hold simultaneously. I was thinking about this a lot last night and realized this isn't a trivial problem. Perhaps it's not possible. – spacetyper Jun 05 '15 at 15:27
  • You might have missed: http://stackoverflow.com/questions/18600348/how-do-i-generate-random-numbers-in-an-array-that-add-up-to-a-defined-total/18600737#18600737 – Lee Daniel Crocker Jun 05 '15 at 17:49
  • @LeeDanielCrocker I think OP is asking for (a) uniformity of the individual values, while simultaneously (b) maintaining uniformity of the set of values over the range [0,M]. The answer you linked is a good one for (b), but doesn't have property (a). However, given that the sum of two U's has a triangular distribution, and summing more U's starts looking more and more Gaussian, what the OP seems to be asking for isn't mathematically possible. – pjs Jun 05 '15 at 18:38
  • @LeeDanielCrocker I did miss that, thanks. I'll take a look at it. – spacetyper Jun 05 '15 at 19:10

5 Answers5

4

What you are asking for seems to be impossible.

However, I will re-interpret your question so that it makes more sense and is possible to solve. What you need is a probability distribution on the seven-dimensional hyperplane x_1 + x_2 + ... + x_8 = 0.5. Since the hyperplane is infinite in extent, a uniform distribution on the whole hyperplane will not work. What you probably(?) want is the chunk of the hyperplane where all the x_i>0. That region is a simplex, a generalization of a triangle, and the uniform distribution on the simplex is a special case of the Dirichlet Distribution.

You may find this section of the Dirichlet Distribution Wikipedia article, string cutting, particularly illuminating.

Implementation

The Wikipedia article gives the following implementation in Python in the Random Number Generation section:

params = [a1, a2, ..., ak]
sample = [random.gammavariate(a,1) for a in params]
sample = [v/sum(sample) for v in sample]

What you probably(?) want is the case where all ai=1 which results in a uniform distribution on the simplex. Here k corresponds to the number N in your question. To get the samples to sum to M instead of 1, just multiply sample by M.

Update

Thanks to Severin Pappadeux for pointing out that gammavariate can return infinity under rare circumstances. That is mathematically "impossible" but can occur as an artifact of the implementation in terms of floating point numbers. My suggestion for handling that case is to check for it after sample is first calculated; if any of the components of sample are infinity, set all the non-infinity components to 0 and set all the infinity components to 1. Then when the xi are calculated, outcomes like xi=1, all other x's=0, or xi=1/2, xj=1/2, all other x's=0 will result, collectively "corner samples" and "edge samples".

Another very-low-probability possibility is for the sum of the gammavariates to overflow. I would guess that we could run through the entire underlying pseudo-random number sequence and not see that happen, but theoretically it is possible (depending on the underlying pseudorandom number generator). The situation could be handled by rescaling sample, e.g., dividing all elements of sample by N, after the gammavariates have been calculated but before the x's are calculated. Personally, I wouldn't bother because the odds are so low; program crashes due to other reasons would have higher probability.

Edward Doolittle
  • 4,002
  • 2
  • 14
  • 27
  • Thanks for the comment about looking for a simplex on a seven dimensional hyperplane, it's more intuitive to think about a solution that way. Still trying to work out how I could implement that though. – spacetyper Jun 05 '15 at 15:32
  • I've added an implementation section to my answer. Test it out and let us know if it gives the behaviour you expect. – Edward Doolittle Jun 05 '15 at 16:20
  • http://stackoverflow.com/questions/18600348/how-do-i-generate-random-numbers-in-an-array-that-add-up-to-a-defined-total/18600737#18600737 – Lee Daniel Crocker Jun 05 '15 at 17:51
  • and how do you handle infinity in gammavariate ? – Severin Pappadeux Jun 05 '15 at 21:18
  • @LeeDanielCrocker Same thing ... Dirichlet distribution does it without the sorting step. Plus parameters in Dirichlet distribution can be adjusted if slightly different behaviour is desired. – Edward Doolittle Jun 05 '15 at 22:40
  • @SeverinPappadeux gammavariate could potentially return very large numbers but not infinity itself. The probability of getting "very large" numbers from gammavariate vanishes exponentially, so is essentially zero for practical purposes. My only concern would be getting an overflow in `sum(sample)`; the probability of that could be worked out for various N; I suspect it's basically 0 except for huge values of N. – Edward Doolittle Jun 05 '15 at 22:45
  • @EdwardDoolittle well, at a=1 gammavariate is exponential distribution, which is sampled via log(-U(0,1)). At U(0,1)=0 you're in trouble. I don't know if python gammavariate have some provision for this. In my answer below I handle such case correctly i believe, take a look. – Severin Pappadeux Jun 06 '15 at 01:34
  • Ah, I see what you mean. I don't know enough about Python to comment, either other than to agree that it is theoretically possible, depending on gammavariate implementation. One of the gammavariates equal to infinity should mean that the corresponding xi=1 "corner sample" (if there are two gammavariates equal to infinity there would be x1=1/2 and xj=1/2, etc. That would be rare to the point of being pretty much impossible, but we can't count it out unless we know more about the implementation of gammavariate ... I note your code does not handle that case). See my edited answer for a solution. – Edward Doolittle Jun 06 '15 at 07:07
2

Instead of selecting 'n' numbers from a uniform distribution that sum to 'M', we can select 'n-1' random interval from a uniform distribution that range '0-M' then we can extract the intervals.

from random import uniform as rand

def randConstrained(n, M):
     splits = [0] + [rand(0, 1) for _ in range(0,n-1)] + [1]
     splits.sort()
     diffs = [x - splits[i - 1] for i, x in enumerate(splits)][1:]
     result = map(lambda x:x*M, diffs)
     return result

res = randConstrained(8,0.5)
print res
print sum(res)

Output

[0.0004411388173262698,
 0.014832306834761111,
 0.009695872790939863,
 0.04539251424140245,
 0.18791325446494067,
 0.07615024971405443,
 0.07792489571128014,
 0.08764976742529507]

0.5
Kavin Eswaramoorthy
  • 1,595
  • 11
  • 19
  • 4
    It's crucial to note that none of the individual components of `res` is uniformly distributed with this solution. – user2357112 Jun 05 '15 at 06:14
  • Precisely. http://stackoverflow.com/questions/18600348/how-do-i-generate-random-numbers-in-an-array-that-add-up-to-a-defined-total/18600737#18600737 – Lee Daniel Crocker Jun 05 '15 at 17:52
0

For a fully generalized solution ("I want n numbers between low and high, that sum to m):

from random import uniform as rand

def randConstrained(n, m, low, high):
    tot = m
    if not low <= 0 <= high:
        raise ValueError("Cannot guarantee a solution when the input does not allow for 0s")
    answer = []
    for _ in range(n-1):
        answer.append(low + rand(0,tot) * (high-low))
        tot -= answer[-1]
    answer.append(m-sum(answer))
    return answer

For your case, this can be used as follows:

In [35]: nums = randConstrained(8, 0.5, 0, 1)

In [36]: nums
Out[36]: 
[0.2502590281277123,
 0.082663797709837,
 0.14586995648173873,
 0.011270073049224807,
 0.009328970756471237,
 0.00021993111786291258,
 0.0001831479074098452,
 0.000205094849743237]
inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
  • `random.random` doesn't take arguments. – user2357112 Jun 05 '15 at 05:38
  • 4
    Better, but the handling of `low` and `high` is wrong, and the distribution systematically puts too much of the weight in the first numbers generated. – user2357112 Jun 05 '15 at 05:44
  • 1
    @inspectorG4dget As OP requested each number is not chosen from a random distribution for example you are choosing the last number directly. – Kavin Eswaramoorthy Jun 05 '15 at 05:47
  • 1
    It would be impossible to guarantee that a set a randomly chosen numbers would add up to a required sum. The determinism of at least one number is inescapable. The other possibility is to allow the use of negative numbers – inspectorG4dget Jun 05 '15 at 05:49
0

It is known as simplex sampling, which is closely related to Dirichlet distribution. Sum(x_i) = 1, where each x_i is U(0,1). In your case after simplex sampling just multiply each x_i by 0.5.

Anyway, translating c++ code from https://github.com/Iwan-Zotow/SimplexSampling into python (hopefully, not too many errors)

And it handles infinity just right

def simplex_sampling(n):
    r = []
    sum = 0.0
    for k in range(0,n):
        x = random.random()
        if x == 0.0:
            return (1.0, make_corner_sample(n, k))

        t = -math.log(x)
        r.append(t)
        sum += t

     return (sum, r)

def make_corner_sample(n, k):
    r = []
    for i in range(0, n):
        if i == k:
            r.append(1.0)
        else:
            r.append(0.0)

    return r

 # main
 sum, r = simplex_sampling(8)

 norm = 0.5 / sum # here is your 0.5 total

 for k in range(0, 8):
     r[k] *= norm
Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64
  • This is good because it handles the most likely exceptional case of a single gammavariate = infinity. I identified some other exceptional cases in my answer but the odds of those other cases occurring is vanishingly small. – Edward Doolittle Jun 06 '15 at 16:35
0

Same thing as k4vin's solution but without an import error which I'm getting on random.uniform.

import random

def rand_constrained(n, total):
    # l is a sorted list of n-1 random numbers between 0 and total
    l = sorted([0] + [total * random.random() for i in range(n - 1)] + [total])
    # Return the intervals between each successive element
    return [l[i + 1] - l[i] for i in range(n)]

print(rand_constrained(3, 10))
# [0.33022261483938276, 8.646666440311822, 1.0231109448487956]

But matplotlib is choking on installation so I can't plot it out right now.

mgbelisle
  • 571
  • 5
  • 8