0

How can I generate random numbers from the set (1, 2, 3, 4, 5, 6) that add up to N and follow the probability distribution (0.3, 0.25, 0.25, 0.1, 0.05, 0.05) in Java?

UPDATE: I can use EnumeratedIntegerDistribution as proposed in this answer but this doesn't take into account that the numbers should add up to N.

nilakshdas
  • 115
  • 1
  • 1
  • 7
  • 1
    https://xkcd.com/221/ – Tunaki Feb 20 '16 at 19:45
  • You could use the EnumeratedIntegerDistribution with a sample size of 1, and manage the sum (in a loop) up to N. As you approach N, you will have to discard any result which causes the sum to exceed N. – Ian Mc Feb 20 '16 at 20:01
  • I thought about that, but wouldn't that mess up the probability distribution at the fringe? Another way to do this can be generating with a sample size of N and picking the first m values that add up to the nearest number less than N and then keep dropping numbers until I reach N. – nilakshdas Feb 20 '16 at 20:07
  • I don't know the answer to that. If N is sufficiently large, it won't make a difference. If N=1, then you can have no randomness. What to do if N=60 and your running sum=58? My approach allows probability for two ones (59, then 60), or a two. I'd be interested in what the correct answer is. One drastic approach is to discard any result that does not SUM to N, and try again from scratch until it does. – Ian Mc Feb 20 '16 at 20:16

1 Answers1

1

This is an interesting variation on the more common problem of integer partitions.

It is relatively straightforward to generate integer partitions of N, so you have a set of lists of integers which sum to N. My idea is that you can assign probability to each such partition, according to the probabilities in your specified distribution. Then sample from the set of partitions according to the calculated per-partition probabilities. I believe the per-partition probability should be just the product of the probabilities of its elements.

This scheme is workable for N which is not too big -- otherwise there are too many partitions. Not sure what to do if the list of partitions is too big. For comparison, the number of integers partitions of 60 is a little less than one million (actually 966467).

I think you can use a rejection sampling scheme: from the list of partitions, pick one (with uniform probability). Then generate a random number between 0 and 1 and look at the calculated probability for that partition. If the random number is less than the calculated probability, take that partition. Otherwise try again; keep trying until you take one.

It occurs to me that maybe just multiplying the per-element probabilities is incorrect. It's going to make the probabilities for long partitions much smaller than for short partitions; I don't know if that's right. Maybe you can think of a different way to assign probabilities to partitions.

EDIT: maybe take product of per-element probabilities divided by the greatest such product (so one of the acceptance probabilities is 1 and the rest are smaller).

EDIT2: I think the scheme above will work, but the per-partition probability needs to be multiplied by a multinomial coefficient (namely n!/(n1!n2!...nm!) where n = length of partition and n1, ..., nm are the numbers of the m distinct elements in the partition).

But maybe a simpler scheme will work: generate random numbers while the sum is less than N. If it's exactly N then return that list. If not, remove elements at random (from anywhere in the list) until the sum is less than N, then start adding elements again.

Robert Dodier
  • 16,905
  • 2
  • 31
  • 48