Selecting a random sample from a very large generator

Question

I am trying to test some strategies for a game, which can be defined by 10 non-negative integers that add up to 100. There are 109 choose 9, or roughly 10^12 of these, so comparing them all is not practical. I would like to take a random sample of about 1,000,000 of these.

I have tried the methods from the answers to this question, and this one, but all still seem far too slow to work. The quickest method seems like it will take about 180 hours on my machine.

This is how I've tried to make the generator (adapted from a previous SE answer). For some reason, changing prob does not seem to impact the run time of turning it into a list.

def tuples_sum_sample(nbval,total, prob, order=True) :
    """ 
        Generate all the tuples L of nbval positive or nul integer 
        such that sum(L)=total.
        The tuples may be ordered (decreasing order) or not
    """
    if nbval == 0 and total == 0 : yield tuple() ; raise StopIteration
    if nbval == 1 : yield (total,) ; raise StopIteration
    if total==0 : yield (0,)*nbval ; raise StopIteration
    for start in range(total,0,-1) :
        for qu in tuples_sum(nbval-1,total-start) :
            if qu[0]<=start :
                sol=(start,)+qu
                if order :
                    if random.random() <prob:
                        yield sol
                else :
                    l=set()
                    for p in permutations(sol,len(sol)) :
                        if p not in l :
                            l.add(p)
                            if random.random()<prob:
                                yield p

Rejection sampling seems like it would take about 3 million years, so this is out as well.

randsample = []
while len(randsample)<1000000:
    x = (random.randint(0,100),random.randint(0,100),random.randint(0,100),random.randint(0,100),random.randint(0,100),random.randint(0,100),random.randint(0,100),random.randint(0,100),random.randint(0,100),random.randint(0,100))
    if sum(x) == 100:
        randsample.append(x)
randsample

Can anyone think of another way to do this?

Thanks

semicolons... in python... good lord. In all seriousness your code is very hard to follow. You don't follow the style conventions and you don't use very descriptive variable names. `l` is a classic example of the worst variable name you can use. Use something like `found_set` — FHTMitchell, Sep 04 '18 at 12:11
@FHTMitchell Thanks for the feedback, still a beginner. Code was mostly stolen from elsewhere, warts and all. — Trying Not To Be Dumb, Sep 04 '18 at 12:32
Have you tried [this method](https://stackoverflow.com/a/3590105/789671)? — glibdud, Sep 04 '18 at 12:42
@glibdud I just tried that, it works very well, similar to Eric Duminil's answer below. — Trying Not To Be Dumb, Sep 04 '18 at 12:46

NateTheGrate · Answer 1 · 2018-09-04T12:40:21.237

1

A couple of frame-challenging questions:

Is there any reason you must generate the entire population, then sample that population?
Why do you need to check if your numbers sum to 100?

You can generate a set of numbers that sum to a value. Check out the first answer here:

Random numbers that add to 100: Matlab

Then generate the number of such sets you desire (1,000,000 in this case).

import numpy as np

def set_sum(number=10, total=100):
    initial = np.random.random(number-1) * total

    sort_list = np.append(initial, [0, total]).astype(int)
    sort_list.sort()

    set_ = np.diff(sort_list)

    return set_

if __name__ == '__main__':
    import timeit

    a = set_sum()

    n = 1000000
    sample = [set_sum() for i in range(n)]

edited Sep 04 '18 at 12:40

answered Sep 04 '18 at 12:26

NateTheGrate

590
3
11

Interesting approach. Your method seems to return results with a wider deviation than mine. I'm not sure which one is closer to being uniform. – Eric Duminil Sep 04 '18 at 12:47
That's very interesting...this is ~supposed~ to be uniform, as touted by the answer I linked. I'd have to do more research to check – NateTheGrate Sep 04 '18 at 13:16

Eric Duminil · Answer 2 · 2018-09-04T12:42:39.433

Numpy to the rescue!

Specifically, you need a multinomial distribution:

import numpy as np
desired_sum = 100
n = 10
np.random.multinomial(desired_sum, np.ones(n)/n, size=1000000)

It outputs a matrix with a million rows of 10 random integers in a few seconds. Each row sums up to 100.

Here's a smaller example:

np.random.multinomial(desired_sum, np.ones(n)/n, size=10)

which outputs:

array([[ 8,  7, 12, 11, 11,  9,  9, 10, 11, 12],
       [ 7, 11,  8,  9,  9, 10, 11, 14, 11, 10],
       [ 6, 10, 11, 13,  8, 10, 14, 12,  9,  7],
       [ 6, 11,  6,  7,  8, 10,  8, 18, 13, 13],
       [ 7,  7, 13, 11,  9, 12, 13,  8,  8, 12],
       [10, 11, 13,  9,  6, 11,  7,  5, 14, 14],
       [12,  5,  9,  9, 10,  8,  8, 16,  9, 14],
       [14,  8, 14,  9, 11,  6, 10,  9, 11,  8],
       [12, 10, 12,  9, 12, 10,  7, 10,  8, 10],
       [10,  7, 10, 19,  8,  5, 11,  8,  8, 14]])

The sums appear to be correct:

sum(np.random.multinomial(desired_sum, np.ones(n)/n, size=10).T)
# array([100, 100, 100, 100, 100, 100, 100, 100, 100, 100])

Python only

You could also start with a list on 10 zeroes, iterate 100 times and increment a random cell each time :

import random
desired_sum = 100
n = 10
row = [0] * n
for _ in range(desired_sum):
    row[random.randrange(n)] += 1

row
# [16, 7, 9, 7, 10, 11, 4, 19, 4, 13]
sum(row)
# 100

Thank you so much. This works perfectly. I guess the idea is that rather than picking a random solution, we roll a 10-sided dice 100 times for each row? — Trying Not To Be Dumb, Sep 04 '18 at 12:37
I don't think rolling 10 sided dice is the best metaphor. But that's the general idea, getting a deterministic answer to the problem, rather than trying a ton of combinations — NateTheGrate, Sep 04 '18 at 13:15

Selecting a random sample from a very large generator

2 Answers2

Python only