0

I want to do practically something very similar as described in this answer. I want to create a list of random numbers that sum up to a given target value. If I would not care about the bounds, I could use what the answer suggests:

>>> print np.random.dirichlet(np.ones(10),size=1)
[[ 0.01779975  0.14165316  0.01029262  0.168136  0.03061161  0.09046587  0.19987289  0.13398581  0.03119906 0.17598322]]

However, I want to be able to control the ranges and the target of the individual parameters. I want to provide the bounds of each parameter. For instance, I would pass a list of three tuples, with each tuple specifying the lower and upper boundary of the uniform distribution. The target keyword argument would describe what the sum should add up to.

get_rnd_numbers([(0.0, 1.0), (0.2, 0.5), (0.3, 0.8)], target=0.9)

The output could for example look like this:

[0.2, 0.2, 0.5]

How could that be achieved?

Update:

  1. Normalising, i.e. dividing by the sum of all random numbers, is not acceptable as it would distort the distribution.
  2. The solution should work with an arbitrary number of parameters / tuples.
  3. As was mentioned in the comment, this question is actually very similar but in another programming language.
n1000
  • 5,058
  • 10
  • 37
  • 65
  • I'm not sure how three numbers can add up to ten when the largest they can be is 1.0, 0.5, and 0.8 respectively. Can you give an example of what your function's output might look like? – Kevin Jul 20 '18 at 18:16
  • 1
    Ok, thanks, I think I get it now :-) – Kevin Jul 20 '18 at 18:23
  • 1
    Something like https://stackoverflow.com/questions/51325425/get-n-distinct-random-numbers-between-two-values-whose-sum-is-equal-to-a-given-n/ ? – Severin Pappadeux Jul 20 '18 at 19:20
  • @SeverinPappadeux Yes – nice catch. I thought I had seen all the questions on the issue by now. I am not sure if I will be able to implement this in Python on my own... – n1000 Jul 20 '18 at 19:27
  • ok, I'll write some code – Severin Pappadeux Jul 20 '18 at 19:37

2 Answers2

0
from random import uniform

while( True ):
    a = uniform(0.0 ,1.0)
    b = uniform(0.2 , 0.5)
    c = 0.9 - a - b
    if(c > 0.3 and c <0.8):
        break

print(a,b,c)

Just find two randoms first. Subtract from the bounds to get the third 'random number'. Check to make sure if it satisfy the boundary conditions.

Bayko
  • 1,344
  • 2
  • 18
  • 25
  • Wouldn't this change the distribution of c? I thought about randomising the order of parameters before. I am sorry I was not so clear, on this, but ideally I am looking for something that can work with an arbitrary number of parameters. – n1000 Jul 20 '18 at 19:35
  • Your distribution of c is not completely random. It is dependent on a and b. The moment you find a and b your c is 'fixed'. – Bayko Jul 20 '18 at 19:43
  • Can you explain why you don't think this cannot be extended to arbitrary number of tuples? You will just have to put more statements inside the while loop – Bayko Jul 20 '18 at 19:47
  • I was thinking in terms of a function that can accept different inputs like `get_rnd_numbers([(0.0, 1.0), (0.2, 0.5), (0.3, 0.8), ...], target=0.9)` – n1000 Jul 20 '18 at 20:07
  • Here is a histogram for the three random variables (n=10000): https://imgur.com/a/vdJWjr8 Unfortunately, they are not uniformly distributed and the first value (blue) does not use the full space (0.0, 1.0). – n1000 Sep 02 '18 at 20:23
0

Ok, here is some idea/code to play with.

We will sample from Dirichlet, so sum objective is automatically fulfilled.

Then for each xi sampled from Dirichlet we apply linear transformation with different lower boundary li but the same scaling parameter s.

vi = li + s*xi

From summation objective (Si means summation over i) and fact, that Dirichlet sampled values are always summed to 1

Si vi = target

we could compute s:

s = target - Si li

Let's put mean value of each vi right into middle of the interval.

E[vi] = li + s*E[xi] = (li + hi) / 2

E[xi] = (hi - li) / 2 / s

And let's introduce knob which is basically proportional to inverse variance of Dirichlets, so bigger is knob, tighter are sampled random values around mean.

So for Dirichlet distribution alpha parameters array

alphai = E[xi] * vscale

where vscale is user-defined variance scale factor. We will check if sampled value violate lower or upper boundary conditions and reject sampling if they do.

Code, Python 3.6, Anaconda 5.2

import numpy as np

boundaries = np.array([[0.0, 1.0], [0.2, 0.5], [0.3, 0.8]])
target = 0.9

def get_rnd_numbers(boundaries, target, vscale):
    lo = boundaries[:, 0]
    hi = boundaries[:, 1]
    s = target - np.sum(lo)
    alpha_i = ( 0.5 * (hi-lo) / s ) * vscale
    print(np.sum(alpha_i))

    x_i = np.random.dirichlet(alpha_i, size=1)
    v_i = lo + s*x_i

    good_lo = not np.any(v_i < lo)        
    good_hi = not np.any(v_i > hi)

    return (good_lo, good_hi, v_i)

vscale = 3.0
gl, gh, v = get_rnd_numbers(boundaries, target, vscale)
print((gl, gh, v, np.sum(v)))
if gl and gh:
    print("Good sample, use it")

gl, gh, v = get_rnd_numbers(boundaries, target, vscale)
print((gl, gh, v, np.sum(v)))
if gl and gh:
    print("Good sample, use it")

gl, gh, v = get_rnd_numbers(boundaries, target, vscale)
print((gl, gh, v, np.sum(v)))
if gl and gh:
    print("Good sample, use it")

You could play with different transformation ideas, maybe remove or replace mean condition to something more sensible. I would advice to keep idea of the knob, so you could tighten your sampling spread.

Severin Pappadeux
  • 18,636
  • 3
  • 38
  • 64
  • Finally I came around to implement your idea (thanks again). However, it seems like this will actually not give uniform distributions... Find a histogram for the three variables from `get_rnd_numbers([(0.0, 1.0), (0.2, 0.5), (0.3, 0.8)], target=0.9, vscale=3.0)` here: https://imgur.com/a/2x9MajD – n1000 Sep 02 '18 at 19:44
  • @n1000 Sorry, a bit late coming back. I'm not sure it ever could be a uniform - from Dirichlet wiki page you could see that even in simplest case when all a_i=1, expected value E[X] = 1/n,n=3 so it would be 0.333, far from 0.5 required for uniform. Variance is different as well. `n` going up means each mean value is decreasing, and it is easy to understand why - sum condition suppress mean. – Severin Pappadeux Sep 04 '18 at 21:20