I realized that the problem itself is not very straightforward, but I found a simple algorithm which I think (not proven yet) will generate uniform distribution over all possible values. I guess this algorithm must have been considered by some people somewhere, since it's quite simple, but I haven't tried to look for the analysis of this algorithm. The algorithm is as follows:
- Iterate from 1 to n
- At each step, find out the real lowerbound and upperbound by assuming that the values after this point onwards will have maximum value (for lowerbound) or minimum value (for upperbound).
- Generate a uniformly distributed number between the bounds (if there is no number satisfying the bound, then the constraint is unsatisfiable)
- Add the generated number to an array
- At the end, shuffle the resulting array
Code:
# -*- coding: utf-8 -*-
"""
To produce an array of random numbers between alpha and beta, which sums to 1
Assumption: there is always a solution for the given constraint.
"""
# Import statements
import random
import sys
def shuffle(arr):
result = []
while len(arr) > 0:
idx = random.randint(0, len(arr)-1)
result.append(arr[idx])
del(arr[idx])
return result
def main():
if len(sys.argv) < 4:
print('Usage: random_constrained.py <n> <alpha> <beta>\nWhere:\n\tn is an integer\n\t0 <= alpha <= beta <= 1')
sys.exit(0)
n = int(sys.argv[1])
alpha = float(sys.argv[2])
beta = float(sys.argv[3])
result = []
total = 0
for i in range(n):
low = max(alpha, 1-total-((n-i-1)*beta))
high = min(beta, 1-total-((n-i-1)*alpha))
if high < low - 1e-10:
print('Error: constraint not satisfiable (high={:.3f}, low={:.3f})'.format(high, low))
sys.exit(1)
num = random.uniform(low, high)
result.append(num)
total += num
result = shuffle(result)
print(result)
if __name__ == '__main__':
main()
Sample outputs:
$ python random_constrained.py 4 0 0.5
[0.06852504971359885, 0.39391285249108765, 0.24215492185626314, 0.2954071759390503]
$ python random_constrained.py 4 0 0.5
[0.2519926400714304, 0.4138640296394964, 0.27906367876610466, 0.055079651522968565]
$ python random_constrained.py 4 0 0.5
[0.11505150404455633, 0.16665881845206237, 0.45371668123772924, 0.264572996265652]
$ python random_constrained.py 4 0 0.5
[0.31689744182294444, 0.11233051635974067, 0.3599600067081529, 0.21081203510916197]
$ python random_constrained.py 4 0 0.5
[0.16158825078700828, 0.18989326608974527, 0.1782112102703714, 0.470307272852875]
$ python random_constrained.py 5 0 0.2
[0.19999999999999998, 0.2, 0.19999999999999996, 0.19999999999999996, 0.20000000000000004]
$ python random_constrained.py 5 0 0.2
[0.2, 0.2, 0.19999999999999998, 0.2, 0.19999999999999996]
$ python random_constrained.py 5 0 0.2
[0.20000000000000004, 0.19999999999999998, 0.19999999999999996, 0.19999999999999998, 0.2]
$ python random_constrained.py 5 0 0.2
[0.2, 0.20000000000000004, 0.19999999999999996, 0.19999999999999996, 0.19999999999999996]
$ python random_constrained.py 2 0.4 1
[0.5254259945319483, 0.47457400546805173]
$ python random_constrained.py 2 0.4 1
[0.5071103628251259, 0.4928896371748741]
$ python random_constrained.py 2 0.4 1
[0.4595236988530377, 0.5404763011469623]
$ python random_constrained.py 2 0.4 1
[0.44218002983240046, 0.5578199701675995]
$ python random_constrained.py 2 0.4 1
[0.4330169754142243, 0.5669830245857757]
$ python random_constrained.py 2 0.4 1
[0.543183373724851, 0.45681662627514896]
I'm not sure yet how to handle the precision error there.
I ran some tests to check the uniformity in case n=2
, generating 100000 arrays (with alpha=0.4, beta=0.6
) and get the values into 10 buckets of equal size from alpha to beta, counting the number of occurrences:
First number: [9998, 9966, 9938, 9952, 10038, 10161, 9899, 10007, 10054, 9987]
Second number: [9987, 10054, 10007, 9899, 10161, 10038, 9952, 9938, 9966, 9998]
For n=4, alpha=0, beta=0.3
, 10000 tries:
[0, 0, 0, 304, 430, 569, 730, 1135, 1874, 4958]
[0, 0, 0, 285, 492, 576, 805, 1113, 1775, 4954]
[0, 0, 0, 248, 465, 578, 769, 1077, 1839, 5024]
[0, 0, 0, 252, 474, 564, 800, 1100, 1808, 5002]
We can see that each number has more or less the same distribution, so there is no bias towards any position.