4

I have seen other posts addressing similar problem. I know how to generate N positive integers. I also know how to restrict the sum of randomly generated integers. The only problem is satisfying the condition that none of the N values fall out of the specified range.

e.g. generate_ints(n, total, low, high) should generate n value array such that each value is between low and high and the sum adds up to the total. Any pointers/ help would be greatly appreciated.

e.g.generate_ints(4, 40, 4, 15) should generate something like

[7,10,13,10]

I don't care if the numbers are repeated, as long as they are not highly skewed. I am using np.randon.randint(5,15,n) to select the integer.

So far, I have tried the following, but it doesn't work -

import numpy as np 
import random 
from random import uniform as rand 

total=50 
n=10 
low=2 
high=15 
result=[] 
m=0 
nobs=1 
while nobs <= n: 
    if m >= (total - low): 
        last_num= total -new_tot 
        result.append(last_num) 
    else: 
        next_num=np.random.randint(low,high,1) 
        new_tot = sum(result) + next_num 
        result.append(next_num) 
        m=new_tot 
    nobs +=1 

print result 
print sum(result)

Thanks again.

idjaw
  • 25,487
  • 7
  • 64
  • 83
sumoka
  • 117
  • 1
  • 2
  • 12
  • 1
    added my code to the question. thanks@idjaw – sumoka Oct 25 '16 at 04:17
  • "As far as they are not highly skewed". I think that is impossible to implement unless you make that precise. – Peaceful Oct 25 '16 at 04:26
  • Ideal would be a uniform distribution. – sumoka Oct 25 '16 at 04:31
  • That's a nice ideal, but unless you can specify how, the question is really too broad. – tripleee Oct 25 '16 at 04:45
  • 1
    @tripleee I consider the description sufficient. Uniform means in this case: from all feasible solutions (n numbers added within these ranges equal to the target); sample one uniformly. (That's of course not the approach to pursue in practice) – sascha Oct 25 '16 at 12:04

3 Answers3

1
import numpy as np

def sampler(samples, sum_to , range_list):
    assert range_list[0]<range_list[1], "Range should be a list, the first element of which is smaller than the second"
    arr = np.random.rand(samples)
    sum_arr = sum(arr)

    new_arr = np.array([int((item/sum_arr)*sum_to) if (int((item/sum_arr)*sum_to)>range_list[0]and int((item/sum_arr)*sum_to)<range_list[1]) \
                            else np.random.choice(range(range_list[0],range_list[1]+1)) for item in arr])
    difference = sum(new_arr) - sum_to
    while difference != 0:
        if difference < 0 :
                for idx in np.random.choice(range(len(new_arr)),abs(difference)):
                    if new_arr[idx] != range_list[1] :
                        new_arr[idx] +=  1

        if difference > 0:
                for idx in np.random.choice(range(len(new_arr)), abs(difference)):
                    if new_arr[idx] != 0 and new_arr[idx] != range_list[0] :
                        new_arr[idx] -= 1
        difference = sum(new_arr) - sum_to
    return new_arr

new_arr = sampler (2872,30000,[5,15])
print "Generated random array is :"
print new_arr
print "Length of array:", len(new_arr)
print "Max of array: ", max(new_arr)
print "min of array: ", min(new_arr)
print "and it sums up to %d" %sum(new_arr)

result :

Generated random array is :
[ 9 10  9 ...,  6 15 11]
Length of array: 2872
Max of array:  15
min of array:  5
and it sums up to 30000
Kennet Celeste
  • 4,593
  • 3
  • 25
  • 34
  • Ran it 1000 times and got results from 177 to 183 – Francisco Oct 25 '16 at 05:27
  • @FranciscoCouzo fixed – Kennet Celeste Oct 25 '16 at 06:02
  • This approach is not uniform (has some bias; in my opinion) but might be enough for some tasks (and is fast compared to rejection-sampling). – sascha Oct 25 '16 at 12:07
  • @sumoka If the answer addresses your question accept it by clicking on the checkmark of that answer ([like here](http://i.stack.imgur.com/QpogP.png)) so that others notice it will work in the first look. – Kennet Celeste Oct 25 '16 at 18:29
  • It doesn't seem to work on larger population e.g. low=5 , high=15, n=2872, total=30000. I see it generates random integers but not within the bounds. – sumoka Oct 25 '16 at 18:46
  • @sumoka give me numbers, how many sample sum up to how much ? – Kennet Celeste Oct 25 '16 at 18:48
  • 2872 samples that should sum up to 30000 with values between 5 & 15. I wrote it as a function.. Maybe I messed up something while writing it as a function. – sumoka Oct 25 '16 at 18:53
1

If I understand the specifications correctly, you wish to randomly generate restricted integer compositions such that each possible composition has an equal likelihood of being chosen.

We can adapt this answer to the problem of uniformly generating a random integer partition to solve this problem exactly for small input values. We simply need a way to count restricted k-compositions. There's a recursive formulation in this answer on Mathematics to a related problem, but it turns out that there is a more explicit formula mentioned as part of this answer that uses binomial coefficients. Here's a implementation in pure Python:

import functools
import random

@functools.lru_cache(1 << 10)
def C1(n, k, a, b):
    "Counts the compositions of `n` into `k` parts bounded by `a` and `b`" 
    return C2(n - k*(a - 1), k, b - (a - 1))

def C2(n, k, b):
    "Computes C(n, k, 1, b) using binomial coefficients"
    total = 0
    sign = +1

    for i in range(0, k + 1):
        total += sign * choose(k, i) * choose(n - i*b - 1, k - 1)
        sign = -sign

    return total

def choose(n, k):
    "Computes the binomial coefficient of (n, k)"
    if k < 0 or k > n:
        return 0

    if k == 0 or k == n:
        return 1

    k = min(k, n - k)
    c = 1

    for i in range(k):
        c = c * (n - i) // (i + 1)

    return c

def check_pre_and_post_conditions(f):
    def wrapper(n, k, a, b):
        assert 1 <= k <= n, (n, k)
        assert 1 <= a <= b <= n, (n, a, b)
        assert k*a <= n <= k*b, (n, k, a, b)

        comp = f(n, k, a, b)

        assert len(comp) == k, (len(comp), k, comp)
        assert sum(comp) == n, (sum(comp), n, comp)
        assert all(a <= x <= b for x in comp), (a, b, comp)

        return comp
    return functools.wraps(f)(wrapper)

@check_pre_and_post_conditions
def random_restricted_composition(n, k, a, b):
    "Produces a random composition of `n` into `k` parts bounded by `a` and `b`"
    total = C1(n, k, a, b)
    which = random.randrange(total)
    comp = []

    while k:
        for x in range(a, min(b, n) + 1):
            count = C1(n - x, k - 1, a, b)

            if count > which:
                break

            which -= count

        comp.append(x)
        n -= x
        k -= 1

    return comp

To select a random composition, we simply generate a random index smaller than the total number of possible compositions, and then construct the i-th lexicographic composition (see the linked questions for explanations of the recurrence relations used). This should produce all possible outcomes with equal probability.

However, because C1(n, k, a, b) grows exponentially, this method is pretty slow for large values of n and k. For large values, an approximate solution will serve you better.

Community
  • 1
  • 1
0

Here's my attempt which I will explain.

import numpy as np


def generate_ints(n, total, low, high):
    begin = 0
    randys = []
    correctTotal = False
    while correctTotal is False:
        while begin < n:
            r1 = np.random.randint(low, high, 1)
            randys.append(r1)
            begin += 1
        if sum(randys) == total:
            correctTotal = True
        else:
            begin = 0
            del randys[:]
    generated_list = np.array(randys).tolist()
    gen = [g[0] for g in generated_list]
    return gen


my_list = generate_ints(4, 40, 4, 15)
print "Generated list '{}' with sum {}.".format(my_list, sum(my_list))

Inside the function, I've set two constants, randys and begin. In the inner while loop, as long as begin is less than n it generates n random integers between low and high. If the sum is equivalent to the total, exit out of the outer while loop, otherwise it needs to reset the constants.

Just returning randys will give a list of NumPy arrays. Using the tolist() method, this produces a list instead.

Now we have a list of lists. I've flattened it using a short and sweet list comprehension. Finally return that list and output as desired.

HTH.

Mangohero1
  • 1,832
  • 2
  • 12
  • 20
  • The code is not that concise (e.g. using a loop for drawing multiple random numbers; it's provided out of the box) and this approach is infeasible for large n or troublesome bounds, but this approach is bias-free and if output is generated, it's uniform. The general algorithmic framework used is called **rejection-sampling**. – sascha Oct 25 '16 at 12:06
  • @sascha That's interesting, I didn't know that was provided! Sorry if my answer turned out to be infeasible, I'm frankly have very little knowledge of numpy but I'll look into this more – Mangohero1 Oct 25 '16 at 13:14
  • Actually, i upvoted it because of the approach (it's not an easy problem). Not sure who downvoted for which reasons (although i did not test your code). – sascha Oct 25 '16 at 13:29
  • Thanks, I agree it's not an easy problem especially generating data with a certain distribution. I was looking for a way to make the "total" a function of "n" and the bounds "low" and "high" . Both these solutions solve that problem. I am not 100% clear about how it works but I'll decode line by line. Thanks again. – sumoka Oct 25 '16 at 16:07