How do I assign random values in an array of n elements satisfying following constraints?

Question

Suppose a is my array of n elements.

Constraints:

alpha <= a[i] <= beta.
0 <= alpha <= beta <= 1.
Sum of all elements in a should be exactly 1.

Assuming such a array is always possible for given value of alpha and beta

This is what I am thinking:

Randomly assign values between alpha and beta. Find sum of array elements. If sum>1, subtract all elements with x such that constraints are satisfied. If sum<1, add all elements with x such that constraints are satisfied.

But I am finding it difficult to find out the newly mapped values.

May a[i] and a[j] be the same? They can't be completely random, if you want their sum to be 1. So you can add random numbers that are ``alpha < randomNumber <= beta - sum of all previous numbers`` until you reach the point when ``1 - sum of all previous numbers < alpha`` Then replace the last number added by ``1 - sum of all previous numbers < alpha``. — ecth, Nov 05 '15 at 07:16
Do you have to randomly assign all values? This means that the chance that the sum of all cells in `a` will be exactly 1 is kinda low. If you can choose the last value, that makes things easier — shapiro yaacov, Nov 05 '15 at 07:17
If alpha = 0.4, beta = 1, and n=2, then the values will all be in the range [0.4,0.6], right? @shapiroyaacov: I think by the definition of the task (with constraint) there will be some dependency between the elements. In particular, you can say that the last value is somewhat chosen. — justhalf, Nov 05 '15 at 07:18
What @ecth wrote seems like a good simple idea. I think by "random" here OP means "uniform distribution over possible space". I think to ensure uniform distribution, you will need to do additional step after ecth's method, which is to shuffle the array. — justhalf, Nov 05 '15 at 07:19
We can expect little tolerance on sum of all elements equal to 1. — Piyush Agarwal, Nov 05 '15 at 07:21
@ecth sems to me that your proposal would lead to a tendency for the last few numbers to be small. As I understand the question, all acceptable configurations should have the same probability. — Henry, Nov 05 '15 at 07:24
@justhalf If alha = 0.4 and beta = 1 , n=2 then the values will be in range [0.4, 1] one possible solution is 0.4 , 0.6 . Other solution : 0.55, 0.45 — Piyush Agarwal, Nov 05 '15 at 07:25
@Henry Yes, but this is only valid for n = 2. Now it's not possible to generalize for n>2 — Piyush Agarwal, Nov 05 '15 at 07:36
@PiyushAgarwal the maximum allowed value is min(beta, alpha + (1 - n*alpha)), a similar relation holds for the minimum allowed value. — Henry, Nov 05 '15 at 07:47
There is a hidden requirement here. If we look at the range of numbers there should be at least one element which is less than `(1/n)`, else the sum will be `> 1`. So we have to force `alpha <= 1/n`. Similarly we will need to force `beta >= 1/n`. Are these implicit in the problem ? — user1952500, Nov 05 '15 at 07:47
@user1952500: Yes, since OP says "Assuming such a array is always possible for given value of alpha and beta" — justhalf, Nov 05 '15 at 07:48
@HighPerformanceMark I don't know how http://stackoverflow.com/questions/3959021/non-biased-return-a-list-of-n-random-positive-numbers-0-so-that-their-sum satisfies the constraint alpha <= a[i] although the solutions specified does satisfy the constraints a[i] <= beta and other constraints — Piyush Agarwal, Nov 05 '15 at 07:49
Is the randomness an actual requirement or an element of a proposed solution ? — H H, Nov 05 '15 at 08:38

bames53 · Accepted Answer · 2015-11-06T06:47:45.003

This can be satisfied iff alpha × n ≤ 1 ≤ beta × n.

You can do this by picking n random numbers. Let r_i be the i^th random number, let R be the sum of the random numbers and S = (beta - alpha)/(1-n×alpha) × R. Set a[i] = alpha + r_i/S * (beta - alpha). As long as S is not less than the maximum random number then all elements of a are between alpha and beta and their sum is 1.

#include <iostream>
#include <cassert>
#include <random>
#include <vector>
#include <algorithm>
#include <iterator>

int main() {
    const double alpha = 0.05, beta = 0.15;
    const int n = 10;

    if (!(n * alpha <= 1.0 && 1.0 <= n * beta)) {
        std::cout << "unable to satisfy costraints.\n";
        return 0;
    }

    if (alpha == beta || std::fabs(1.0 - n * alpha) < 1e-6 || std::fabs(1.0 - n * beta) < 1e-6) {
        std::cout << "Solution for alpha = " << alpha << ", beta = " << beta << " n = " << n << ":\n";
        std::fill_n(std::ostream_iterator<double>(std::cout, " "), n, 1.0/n);
        std::cout << "\nSum: 1.0\n";
        return 0;
    }

    std::vector<int> r;
    double S;

    {
        std::random_device rd;
        std::seed_seq seed{rd(), rd(), rd(), rd(), rd(), rd(), rd(), rd()};
        std::mt19937 eng(seed);
        std::uniform_int_distribution<> dist(0, 1000);

        do {
            r.clear();
            std::generate_n(back_inserter(r), n, std::bind(dist, std::ref(eng)));

            int sum = std::accumulate(begin(r), end(r), 0);
            S = (beta - alpha) / (1 - n * alpha) * sum;
        } while (S < *std::max_element(begin(r), end(r)));

    }

    std::vector<double> a(n);
    std::transform(begin(r), end(r), begin(a), [&] (int ri) { return alpha + ri/S * (beta - alpha); });

    for (double ai : a) {
        assert(alpha <= ai && ai <= beta);
    }

    std::cout << "Solution for alpha = " << alpha << ", beta = " << beta << " n = " << n << ":\n";
    std::copy(begin(a), end(a), std::ostream_iterator<double>(std::cout, " "));
    std::cout << '\n';
    std::cout << "Sum: " << std::accumulate(begin(a), end(a), 0.0) << '\n';
}

Solution for alpha = 0.05, beta = 0.15, n = 10:
0.073923 0.117644 0.0834555 0.139368 0.101696 0.0846471 0.115261 0.0759395 0.0918882 0.116178
Sum: 1

You didn't specify the particular distribution you wanted the algorithm to provide, but I thought I'd point out that this algorithm does not necessarily produce every possible solution with equal probability; I believe some solutions are more likely to be produced than others.

my answer seems to be identical to yours. I was lost in thought and formatting as I worked on this and did not see that you had posted your solution. Hopefully my answer can add some insight into the choice of the constant and the general reasoning behind it. — user1952500, Nov 05 '15 at 08:41
Thanks. I haven't read the other two answers yet. But I think, this is the solution which I was looking for. — Piyush Agarwal, Nov 05 '15 at 09:16
@bames53: why is this solution not uniformly distributed ? I ask because my solution is essentially the same. The numbers are uniformly distributed in (0, constant) and those are mapped into (beta-alpha) range with an offset of alpha. So why should there be any sort of bias ? — user1952500, Nov 05 '15 at 19:05
@user1952500 The way I'm thinking of it, it's sort of like the reason that picking two values uniformly distributed from 0 to 1 and treating them like a 2d vector doesn't produce a uniform distribution of directions. Instead of any direction between [0, pi/2] being equally likely, the directions will tend to cluster around the middle of that range. — bames53, Nov 05 '15 at 20:49
I think (I'm not sure) that this solution has a similar sort of problem because we're picking points uniformly distributed across a hypercube volume and then sort of scaling those points to fit on the hyperplane of the solution. It seems to me like the solutions produced by this algorithm should cluster toward the middle — bames53, Nov 05 '15 at 20:57

user1952500 · Answer 2 · 2015-11-05T18:40:08.737

(As I mentioned in the comment, there is a hidden requirement here. If we look at the range of numbers there should be at least one element <= (1/n), else the sum will be > 1. So we have to force alpha <= 1/n. Similarly we will need to force beta >= 1/n.)

We could do the following:

Let x_i = 0 for i = 0.
Choose a random number in [x_i, C]. Let it be x_(i+1). We will describe this constant C below.
Do the above n-1 times to get {x_1, x_2, ...., x_(n-1)}.
Now the numbers {(x_1 - x_0), ..., (1 - x_(n-1))} are positive and sum to C. Let these be {y_1, y_2, ..., y_n}.
Now consider k_i = alpha + y_i * (beta - alpha). Since y_i is in (0, C), k_i will also be between alpha and beta if C <= 1.
Summation of k_i is n * alpha + (beta - alpha) * (sum of y_i). This is the same as n * alpha + (beta - alpha) * C. This should be equal to 1.

Solving for C, we get C = (1 - n * alpha) / (beta - alpha).

To illustrate: if alpha = 0.4, beta = 1, n = 2, we get: C = (1 - 0.8) / 0.6 = 1/3

Random values for x_i are, say: {1/8} (since we only care until (n-1) = 1)

y_i = {1/8, (1/3 - 1/8)} = {1/8, 5/24}

beta - alpha = 0.6

k_i = {0.4 + 0.6/8, 0.4 + 3/24} = {0.475, 0.525}

Hence the required array is {0.475 and 0.525} which sums to 1.

The underlying assumption here is that C < 1. I think we may be able to prove that C < 1 is a strong requirement for the feasibility of the problem, but it's getting too late here to prove this now.

Note that the above yields uniformly random numbers since the numbers x_i are chosen randomly in the range (0, C).

justhalf · Answer 3 · 2015-11-05T08:16:46.580

I realized that the problem itself is not very straightforward, but I found a simple algorithm which I think (not proven yet) will generate uniform distribution over all possible values. I guess this algorithm must have been considered by some people somewhere, since it's quite simple, but I haven't tried to look for the analysis of this algorithm. The algorithm is as follows:

Iterate from 1 to n
At each step, find out the real lowerbound and upperbound by assuming that the values after this point onwards will have maximum value (for lowerbound) or minimum value (for upperbound).
Generate a uniformly distributed number between the bounds (if there is no number satisfying the bound, then the constraint is unsatisfiable)
Add the generated number to an array
At the end, shuffle the resulting array

Code:

# -*- coding: utf-8 -*-
"""
To produce an array of random numbers between alpha and beta, which sums to 1
Assumption: there is always a solution for the given constraint.
"""

# Import statements
import random
import sys

def shuffle(arr):
    result = []
    while len(arr) > 0:
        idx = random.randint(0, len(arr)-1)
        result.append(arr[idx])
        del(arr[idx])
    return result

def main():
    if len(sys.argv) < 4:
        print('Usage: random_constrained.py <n> <alpha> <beta>\nWhere:\n\tn is an integer\n\t0 <= alpha <= beta <= 1')
        sys.exit(0)
    n = int(sys.argv[1])
    alpha = float(sys.argv[2])
    beta = float(sys.argv[3])
    result = []
    total = 0
    for i in range(n):
        low = max(alpha, 1-total-((n-i-1)*beta))
        high = min(beta, 1-total-((n-i-1)*alpha))
        if high < low - 1e-10:
            print('Error: constraint not satisfiable (high={:.3f}, low={:.3f})'.format(high, low))
            sys.exit(1)
        num = random.uniform(low, high)
        result.append(num)
        total += num
    result = shuffle(result)
    print(result)

if __name__ == '__main__':
    main()

Sample outputs:

$ python random_constrained.py 4 0 0.5
[0.06852504971359885, 0.39391285249108765, 0.24215492185626314, 0.2954071759390503]
$ python random_constrained.py 4 0 0.5
[0.2519926400714304, 0.4138640296394964, 0.27906367876610466, 0.055079651522968565]
$ python random_constrained.py 4 0 0.5
[0.11505150404455633, 0.16665881845206237, 0.45371668123772924, 0.264572996265652]
$ python random_constrained.py 4 0 0.5
[0.31689744182294444, 0.11233051635974067, 0.3599600067081529, 0.21081203510916197]
$ python random_constrained.py 4 0 0.5
[0.16158825078700828, 0.18989326608974527, 0.1782112102703714, 0.470307272852875]

$ python random_constrained.py 5 0 0.2
[0.19999999999999998, 0.2, 0.19999999999999996, 0.19999999999999996, 0.20000000000000004]
$ python random_constrained.py 5 0 0.2
[0.2, 0.2, 0.19999999999999998, 0.2, 0.19999999999999996]
$ python random_constrained.py 5 0 0.2
[0.20000000000000004, 0.19999999999999998, 0.19999999999999996, 0.19999999999999998, 0.2]
$ python random_constrained.py 5 0 0.2
[0.2, 0.20000000000000004, 0.19999999999999996, 0.19999999999999996, 0.19999999999999996]

$ python random_constrained.py 2 0.4 1
[0.5254259945319483, 0.47457400546805173]
$ python random_constrained.py 2 0.4 1
[0.5071103628251259, 0.4928896371748741]
$ python random_constrained.py 2 0.4 1
[0.4595236988530377, 0.5404763011469623]
$ python random_constrained.py 2 0.4 1
[0.44218002983240046, 0.5578199701675995]
$ python random_constrained.py 2 0.4 1
[0.4330169754142243, 0.5669830245857757]
$ python random_constrained.py 2 0.4 1
[0.543183373724851, 0.45681662627514896]

I'm not sure yet how to handle the precision error there.

I ran some tests to check the uniformity in case n=2, generating 100000 arrays (with alpha=0.4, beta=0.6) and get the values into 10 buckets of equal size from alpha to beta, counting the number of occurrences:

First number: [9998, 9966, 9938, 9952, 10038, 10161, 9899, 10007, 10054, 9987]
Second number: [9987, 10054, 10007, 9899, 10161, 10038, 9952, 9938, 9966, 9998]

For n=4, alpha=0, beta=0.3, 10000 tries:

[0, 0, 0, 304, 430, 569, 730, 1135, 1874, 4958]
[0, 0, 0, 285, 492, 576, 805, 1113, 1775, 4954]
[0, 0, 0, 248, 465, 578, 769, 1077, 1839, 5024]
[0, 0, 0, 252, 474, 564, 800, 1100, 1808, 5002]

We can see that each number has more or less the same distribution, so there is no bias towards any position.

How do I assign random values in an array of n elements satisfying following constraints?

3 Answers3