Generate list of random number with condition - numpy

Question

I would like to generate a list of 15 integers with sum 12, minimum value is 0 and maximum is 6.

I tried following code

def generate(low,high,total,entity):
   while sum(entity)!=total:
       entity=np.random.randint(low, high, size=15)
   return entity

But above function is not working properly. It is too much time consuming. Please let me know the efficient way to generate such numbers?

Well strictly speaking it is working, but generate and test is usually not very efficient. It here will usually take tenthousands of generates, before you generate a correct sequence. — Willem Van Onsem, Sep 22 '19 at 09:53

score 4 · Accepted Answer · answered Sep 22 '19 at 10:45

The above will, strictly speaking work. But for 15 numbers between 0 and 6, the odds of generating 12 is not that high. In fact we can calculate the number of possibilities with:

F(s, 1) = 1 for 0≤s≤6 and

F(s, n) = Σ⁶_i=0F(s-i, n-1).

We can calculate that with a value:

from functools import lru_cache

@lru_cache()
def f(s, n, mn, mx):
    if n < 1:
        return 0
    if n == 1:
        return int(mn <= s <= mx)
    else:
        if s < mn:
            return 0
        return sum(f(s-i, n-1, mn, mx) for i in range(mn, mx+1))

That means that there are 9'483'280 possibilities, out of 4'747'561'509'943 total possibilities to generate a sum of 12, or 0.00019975%. It will thus take approximately 500'624 iterations to come up with such solution.

We thus should better aim to find a straight-forward way to generate such sequence. We can do that by each time calculating the probability of generating a number: the probability of generating i as number as first number in a sequence of n numbers that sums up to s is F(s-i, n-1, 0, 6)/F(s, n, 0, 6). This will guarantee that we generate a uniform list over the list of possibilities, if we would each time draw a uniform number, then it will not match a uniform distribution over the entire list of values that match the given condition:

We can do that recursively with:

from numpy import choice

def sumseq(n, s, mn, mx):
    if n > 1:
        den = f(s, n, mn, mx)
        val, = choice(
            range(mn, mx+1),
            1,
            p=[f(s-i, n-1, mn, mx)/den for i in range(mn, mx+1)]
        )
        yield val
        yield from sumseq(n-1, s-val, mn, mx)
    elif n > 0:
        yield s

With the above function, we can generate numpy arrays:

>>> np.array(list(sumseq(15, 12, 0, 6)))
array([0, 0, 0, 0, 0, 4, 0, 3, 0, 1, 0, 0, 1, 2, 1])
>>> np.array(list(sumseq(15, 12, 0, 6)))
array([0, 0, 1, 0, 0, 1, 4, 1, 0, 0, 2, 1, 0, 0, 2])
>>> np.array(list(sumseq(15, 12, 0, 6)))
array([0, 1, 0, 0, 2, 0, 3, 1, 3, 0, 1, 0, 0, 0, 1])
>>> np.array(list(sumseq(15, 12, 0, 6)))
array([5, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1])
>>> np.array(list(sumseq(15, 12, 0, 6)))
array([0, 0, 0, 0, 4, 2, 3, 0, 0, 0, 0, 0, 3, 0, 0])

Isn't it a multinomial distribution defined in some, frankly, strange way? — Severin Pappadeux, Sep 23 '19 at 13:50
@SeverinPappadeux: that could be the case, I will try to take a look at that. I had as idea that (a) the implementation should be straight-forward, so without a "reject", and (2) that it as if you would have drawn from a uniform sampling, and then do rejection. If I take a look at the formulas, it might be the case, although proving it will probably take some work :) — Willem Van Onsem, Sep 23 '19 at 14:16
If you find something interesting, I would appreciate if you send me a message, thank you — Severin Pappadeux, Sep 23 '19 at 15:10
I updated my answer and believe we proposed different solutions, please take a look at it — Severin Pappadeux, Sep 23 '19 at 15:40

score 2 · Answer 2 · answered Sep 22 '19 at 10:43

You could try it implementing it a little bit differently.

import random
def generate(low,high,goal_sum,size=15):
    output = []
    for i in range(size):
        new_int = random.randint(low,high)
        if sum(output) + new_int <= goal_sum:
            output.append(new_int)
        else:
            output.append(0)
    random.shuffle(output)
    return output

Also, if you use np.random.randint, your high will actually be high-1

Severin Pappadeux · Answer 3 · 2019-09-23T15:39:53.470

Well, there is a simple and natural solution - use distribution which by definition provides you array of values with the fixed sum. Simplest one is Multinomial Distribution. The only code to add is to check and reject (and repeat sampling) if some sampled value is above maximum.

Along the lines

import numpy as np

def sample_sum_interval(n, p, maxv):
    while True:
        q = np.random.multinomial(n, p)
        v = np.where(q > maxv)
        if len(v[0]) == 0: # if len(v) > 0, some values are outside the range, reject
            return q
    return None

np.random.seed(32345)

k    = 15
n    = 12
maxv = 6
p = np.full((k), np.float64(1.0)/np.float64(k), dtype=np.float64) # probabilities

q = sample_sum_interval(n, p, maxv)
print(q)
print(np.sum(q))

q = sample_sum_interval(n, p, maxv)
print(q)
print(np.sum(q))

q = sample_sum_interval(n, p, maxv)
print(q)
print(np.sum(q))

UPDATE

I quickly looked at @WillemVanOnsem proposed method, and I believe it is different from multinomial used by myself.

If we look at multinomial PMF, and assume equal probabilities for all k numbers, p₁ = ... = p_k = 1/k, then we could write PMF as

PMF(x₁,...x_k)=n!/(x₁!...x_k!) p₁^x₁...p_k^x_k = n!/(x₁!...x_k!) k^-x₁...k^-x_k = n!/(x₁!...x_k!) k^-Sum_ix_i = n!/(x₁!...x_k!) k^-n

Obviously, probabilities of particular x₁...x_k combinations would be different from each other due to factorials in denominator (modulo permutations, of course), which is different from @WillemVanOnsem approach where all of them would have equal probabilities to appear, I believe.

Moral of the story - those methods produce different distributions.

Generate list of random number with condition - numpy

3 Answers3