Generating a list of random numbers, using custom bounds and summing to a desired value

Question

I want to do practically something very similar as described in this answer. I want to create a list of random numbers that sum up to a given target value. If I would not care about the bounds, I could use what the answer suggests:

>>> print np.random.dirichlet(np.ones(10),size=1)
[[ 0.01779975  0.14165316  0.01029262  0.168136  0.03061161  0.09046587  0.19987289  0.13398581  0.03119906 0.17598322]]

However, I want to be able to control the ranges and the target of the individual parameters. I want to provide the bounds of each parameter. For instance, I would pass a list of three tuples, with each tuple specifying the lower and upper boundary of the uniform distribution. The target keyword argument would describe what the sum should add up to.

get_rnd_numbers([(0.0, 1.0), (0.2, 0.5), (0.3, 0.8)], target=0.9)

The output could for example look like this:

[0.2, 0.2, 0.5]

How could that be achieved?

Update:

Normalising, i.e. dividing by the sum of all random numbers, is not acceptable as it would distort the distribution.
The solution should work with an arbitrary number of parameters / tuples.
As was mentioned in the comment, this question is actually very similar but in another programming language.

I'm not sure how three numbers can add up to ten when the largest they can be is 1.0, 0.5, and 0.8 respectively. Can you give an example of what your function's output might look like? — Kevin, Jul 20 '18 at 18:16
Something like https://stackoverflow.com/questions/51325425/get-n-distinct-random-numbers-between-two-values-whose-sum-is-equal-to-a-given-n/ ? — Severin Pappadeux, Jul 20 '18 at 19:20
@SeverinPappadeux Yes – nice catch. I thought I had seen all the questions on the issue by now. I am not sure if I will be able to implement this in Python on my own... — n1000, Jul 20 '18 at 19:27

score 0 · Answer 1 · answered Jul 20 '18 at 19:27

0

from random import uniform

while( True ):
    a = uniform(0.0 ,1.0)
    b = uniform(0.2 , 0.5)
    c = 0.9 - a - b
    if(c > 0.3 and c <0.8):
        break

print(a,b,c)

Just find two randoms first. Subtract from the bounds to get the third 'random number'. Check to make sure if it satisfy the boundary conditions.

answered Jul 20 '18 at 19:27

Bayko

1,344
2
18
25

Wouldn't this change the distribution of c? I thought about randomising the order of parameters before. I am sorry I was not so clear, on this, but ideally I am looking for something that can work with an arbitrary number of parameters. – n1000 Jul 20 '18 at 19:35
Your distribution of c is not completely random. It is dependent on a and b. The moment you find a and b your c is 'fixed'. – Bayko Jul 20 '18 at 19:43
Can you explain why you don't think this cannot be extended to arbitrary number of tuples? You will just have to put more statements inside the while loop – Bayko Jul 20 '18 at 19:47
I was thinking in terms of a function that can accept different inputs like `get_rnd_numbers([(0.0, 1.0), (0.2, 0.5), (0.3, 0.8), ...], target=0.9)` – n1000 Jul 20 '18 at 20:07
Here is a histogram for the three random variables (n=10000): https://imgur.com/a/vdJWjr8 Unfortunately, they are not uniformly distributed and the first value (blue) does not use the full space (0.0, 1.0). – n1000 Sep 02 '18 at 20:23

Severin Pappadeux · Answer 2 · 2018-07-20T21:54:06.217

Ok, here is some idea/code to play with.

We will sample from Dirichlet, so sum objective is automatically fulfilled.

Then for each x_i sampled from Dirichlet we apply linear transformation with different lower boundary l_i but the same scaling parameter s.

v_i = l_i + s*x_i

From summation objective (S_i means summation over i) and fact, that Dirichlet sampled values are always summed to 1

S_i v_i = target

we could compute s:

s = target - S_i l_i

Let's put mean value of each v_i right into middle of the interval.

E[v_i] = l_i + s*E[x_i] = (l_i + h_i) / 2

E[x_i] = (h_i - l_i) / 2 / s

And let's introduce knob which is basically proportional to inverse variance of Dirichlets, so bigger is knob, tighter are sampled random values around mean.

So for Dirichlet distribution alpha parameters array

alpha_i = E[x_i] * vscale

where vscale is user-defined variance scale factor. We will check if sampled value violate lower or upper boundary conditions and reject sampling if they do.

Code, Python 3.6, Anaconda 5.2

import numpy as np

boundaries = np.array([[0.0, 1.0], [0.2, 0.5], [0.3, 0.8]])
target = 0.9

def get_rnd_numbers(boundaries, target, vscale):
    lo = boundaries[:, 0]
    hi = boundaries[:, 1]
    s = target - np.sum(lo)
    alpha_i = ( 0.5 * (hi-lo) / s ) * vscale
    print(np.sum(alpha_i))

    x_i = np.random.dirichlet(alpha_i, size=1)
    v_i = lo + s*x_i

    good_lo = not np.any(v_i < lo)        
    good_hi = not np.any(v_i > hi)

    return (good_lo, good_hi, v_i)

vscale = 3.0
gl, gh, v = get_rnd_numbers(boundaries, target, vscale)
print((gl, gh, v, np.sum(v)))
if gl and gh:
    print("Good sample, use it")

gl, gh, v = get_rnd_numbers(boundaries, target, vscale)
print((gl, gh, v, np.sum(v)))
if gl and gh:
    print("Good sample, use it")

gl, gh, v = get_rnd_numbers(boundaries, target, vscale)
print((gl, gh, v, np.sum(v)))
if gl and gh:
    print("Good sample, use it")

You could play with different transformation ideas, maybe remove or replace mean condition to something more sensible. I would advice to keep idea of the knob, so you could tighten your sampling spread.

Finally I came around to implement your idea (thanks again). However, it seems like this will actually not give uniform distributions... Find a histogram for the three variables from `get_rnd_numbers([(0.0, 1.0), (0.2, 0.5), (0.3, 0.8)], target=0.9, vscale=3.0)` here: https://imgur.com/a/2x9MajD — n1000, Sep 02 '18 at 19:44
@n1000 Sorry, a bit late coming back. I'm not sure it ever could be a uniform - from Dirichlet wiki page you could see that even in simplest case when all a_i=1, expected value E[X] = 1/n,n=3 so it would be 0.333, far from 0.5 required for uniform. Variance is different as well. `n` going up means each mean value is decreasing, and it is easy to understand why - sum condition suppress mean. — Severin Pappadeux, Sep 04 '18 at 21:20

Generating a list of random numbers, using custom bounds and summing to a desired value

2 Answers2