0

I need to generate a file filled with three "random" values per line (10 lines), but those values sum must equal 15.

The structure is: "INDEX A B C".

Example:

1 15 0 0
2 0 15 0
3 0 0 15
4 1 14 0
5 2 13 0
6 3 12 0
7 4 11 0
8 5 10 0
9 6 9 0
10 7 8 0

3 Answers3

4

If you want to avoid needing to create (or iterate through) the full space of satisfying permutations (which, for large N is important), then you can solve this problem with sequential sample.

My first approach was to just draw a value uniformly from [0, N], call it x. Then draw a value uniformly from [0, N-x] and call it y, then set z = N - x - y. If you then shuffle these three, you'll get back a reasonable draw from the space of solutions, but it won't be exactly uniform.

As an example, consider where N=3. Then the probability of some permutation of (3, 0, 0) is 1/4, even though it is only one out of 10 possible triplets. So this privileges values that contain a high max.

You can perfectly counterbalance this effect by sampling the first value x proportionally to how many values will be possible for y conditioned on x. So for example, if x happened to be N, then there is only 1 compatible value for y, but if x is 0, then there are 4 compatible values, namely 0 through 3.

In other words, let Pr(X=x) be (N-x+1)/sum_i(N-i+1) for i from 0 to N. Then let Pr(Y=y | X=x) be uniform on [0, N-x].

This works out to P(X,Y) = P(Y|X=x) * P(X) = 1/(N-x+1) * [N - x + 1]/sum_i(N-i+1), which is seen to be uniform, 1/sum_i(N-i+1), for each candidate triplet.

Note that sum(N-i+1 for i in range(0, N+1)) gives the number of different ways to sum 3 non-negative integers to get N. I don't know a good proof of this, and would happy if someone adds one to the comments!

Here's a solution that will sample this way:

import random
from collections import Counter

def discrete_sample(weights):
    u = random.uniform(0, 1)
    w_t = 0
    for i, w in enumerate(weights):
        w_t += w
        if u <= w_t:
            return i
    return len(weights)-1

def get_weights(N):
    vals = [(N-i+1.0) for i in range(0, N+1)]
    totl = sum(vals)
    return [v/totl for v in vals]

def draw_summing_triplet(N):
    weights = get_weights(N)
    x = discrete_sample(weights)
    y = random.randint(0, N-x)
    triplet = [x, y, N - x - y]
    random.shuffle(triplet)
    return tuple(triplet)

Much credit goes to @DSM in the comments for questioning my original answer and providing good feedback.

In this case, we can test out the sampler like this:

foo = Counter(draw_summing_triplet(3) for i in range(10**6))
print foo

Counter({(1, 2, 0): 100381, 
         (0, 2, 1): 100250, 
         (1, 1, 1): 100027, 
         (2, 1, 0): 100011, 
         (0, 3, 0): 100002, 
         (3, 0, 0): 99977, 
         (2, 0, 1): 99972,
         (1, 0, 2): 99854, 
         (0, 0, 3): 99782, 
         (0, 1, 2): 99744})
ely
  • 74,674
  • 34
  • 147
  • 228
  • 1
    I'm not sure I agree with your argument for uniformity. The distribution on x is uniform, but that means there's a 1/16 change of getting (15, 0, 0), which is only 1/136 of the sample space. – DSM Oct 23 '14 at 01:17
  • Yeah, I think you're right. You have to add a meta distribution over which component 'x' will be (equally likely to be spot 1, 2, or 3 in the tuple). The property that I am missing is [exchangeability](http://en.wikipedia.org/wiki/Exchangeable_random_variables), will fix. – ely Oct 23 '14 at 01:30
  • I believe that fixes it (doing a `random.shuffle` on the output, so that each component has an equal chance of getting "first dibs" on the range of values. But please check and comment again. Thank you for checking! – ely Oct 23 '14 at 01:32
  • I don't think that will work for two reasons. One is shallow -- `random.shuffle` acts in place and returns `None`. :-) The deeper one is that while this deals with the order issue, it doesn't fix the fact that the *permutations* of (15,0,0) are still likelier to occur. One way to quickly see the issue is to do something like `Counter(draw_summing_triplet(3) for i in range(10**6))`. – DSM Oct 23 '14 at 01:36
  • 1
    You're very right. I'm suddenly much more interested in this problem. – ely Oct 23 '14 at 01:41
  • I guess this has more in common with the problem of sampling matrices that have fixed row sums, etc., where once you get into high dimensions you have to go really far with extremely complicated combinatorial arguments used with things like Metropolis-Hastings. That makes the generator-based approach much more likely to be the best general case approach. – ely Oct 23 '14 at 01:50
  • Although, for N large enough, say close to 3000, (or different based on the period of the RNG used), even the generator approach couldn't sample the possibilities. I'm often really surprised by how small things need to be before permutations of them are intractable. 5000 isn't a terribly large number for which you might want to sample from the list of 3000-tuples that can sum to 5000. That actually could happen somewhere in practice. [Here's a link about it.](http://stackoverflow.com/questions/3062741/maximal-length-of-list-to-shuffle-with-python-random-shuffle). – ely Oct 23 '14 at 01:53
  • @DSM Don't know if you're still around, but I believe the edits I made demonstrate a correct version with sequential sampling leading to a uniform draw over the sample space. – ely Oct 23 '14 at 03:10
  • Nice! I'd thought you'd need a number_of_partitions function, but since there are only three elements we don't. Your `sum(N-i+1 for i in range(0, N+1))` fact I think follows as a property of triangular numbers. – DSM Oct 24 '14 at 18:47
  • Yes that's right -- and also yes, generalizing to more than a triplet seems that it would be hard. – ely Oct 24 '14 at 18:49
2

If the numbers can by any just use combinations:

from itertools import combinations
with open("rand.txt","w") as f:
    combs = [x for x in combinations(range(16),3) if sum(x ) == 15 ][:10]   
    for a,b,c in combs:
        f.write("{} {} {}\n".format(a,b,c))
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
  • I think you want `range(16)` there, given his first three rows. – abarnert Oct 22 '14 at 23:34
  • 1
    Also, it's a bit inefficient to throw away 541/560 of the combinations you generate… but I doubt that will matter, and I think the nice simplicity of this code more than makes up for it. – abarnert Oct 22 '14 at 23:36
  • @abarnert, yes, I changed to 16. Certainly not efficient but for what the OP wants it should be fine. – Padraic Cunningham Oct 22 '14 at 23:37
  • Also, to the OP: `enumerate` with a `start=1` will give you the "index" to go with the `a, b, c` values. (I don't think that needs to be in the answer; it's not the interesting part of the question he was asking about.) – abarnert Oct 22 '14 at 23:41
  • That works, but the previous solution seems to better fit my needs. But, thank you! – William Fernandes Oct 22 '14 at 23:41
  • I actually had used enumerate locally but there was just too much going on in the for loop so I left it out of the answer – Padraic Cunningham Oct 22 '14 at 23:42
0

This seems straight forward to me and it utilizes the random module.

import random

def foo(x):
    a = random.randint(0,x)
    b = random.randint(0,x-a)
    c = x - (a +b)

    return (a,b,c)

for i in range(100):
    print foo(15)
shrewmouse
  • 5,338
  • 3
  • 38
  • 43
  • This is identical to my answer, with the addition of the unnecessary `if` statement to check for `x-a == 0`. `random.randint` can take the same value for low and high, like `random.randint(0, 0)`. – ely Oct 23 '14 at 00:55
  • Yeah, you're right the check for zero doesn't add anything. – shrewmouse Oct 23 '14 at 01:06
  • I suppose if the implementation of `random.randint` was very inefficient for the `random.randint(0, 0)` case, compared with the cost of doing the `if` comparison, it could be worthwhile. But `random.randint` is smart enough to check if the low bound equals the high bound. – ely Oct 23 '14 at 01:08
  • I must of been thinking about range returning an empty array in the case of range(0). Oh well. We got the same basic answer because I was off coding up the answer and testing it. Saw it on my phone and thought about it. – shrewmouse Oct 23 '14 at 01:11
  • Do note, as DSM explained in comments to my answer, that this *doesn't* create a uniform sample over the space of permissible triplets. – ely Oct 23 '14 at 02:02