1

I need to generate binomial random numbers:

For example, consider binomial random numbers. A binomial random number is the number of heads in N tosses of a coin with probability p of a heads on any single toss. If you generate N uniform random numbers on the interval (0,1) and count the number less than p, then the count is a binomial random number with parameters N and p.

In my case, my N could range from 1*10^3 to 1*10^10. My p is around 1*10^(-7).

Often my n*p is around 1*10^(-3).

There is a trivial implementation to generate such binomial random number through loops:

public static int getBinomial(int n, double p) {
  int x = 0;
  for(int i = 0; i < n; i++) {
    if(Math.random() < p)
      x++;
  }
  return x;
}

This native implementation is very slow. I tried the Acceptance Rejection/Inversion method [1] implemented in the Colt (http://acs.lbl.gov/software/colt/) lib. It is very fast, but the distribution of its generated number only agrees with the native implementation when n*p is not very small. In my case when n*p = 1*10^(-3), the native implementation can still generate the number 1 after many runs, but the Acceptance Rejection/Inversion method can never generate the number 1 (always returns 0).

Does anyone know what is the problem here? Or can you suggest a better binomial random number generating algorithm that can solve my case.

[1] V. Kachitvichyanukul, B.W. Schmeiser (1988): Binomial random variate generation, Communications of the ACM 31, 216-222.

Changwang Zhang
  • 2,467
  • 7
  • 38
  • 64
  • 1
    Isn't this a duplicate of your [previous question](http://stackoverflow.com/q/23561551/812912) – Ivaylo Strandjev May 10 '14 at 09:58
  • @IvayloStrandjev Yes, it is. This question has been migrated from mathoverflow.net. – Ali May 10 '14 at 12:41
  • Did you try the cdf method? http://en.wikipedia.org/wiki/Inverse_transform_sampling – usul May 10 '14 at 15:34
  • @usul CDF inversion is in general not a good way to go for discrete distributions. Since the CDF for discrete distributions is a step function, it involves searching to find which steps you fall between. – pjs May 10 '14 at 21:27
  • @pjs, sure, but you can use binary search, or when np is smaller than 1, almost certainly a search through the first few values will find it. I'm not asserting this will be fast since I haven't tried it, but it's not obvious why searching would be slow. – usul May 10 '14 at 22:59
  • @usul The search would take O(n) to set up, and then if you did binary search O(log n) for each value generated. For the same O(n) setup you might as well build an alias table and do the generation in O(1) time. – pjs May 10 '14 at 23:30

2 Answers2

1

If n*p is a fixed small number t, and n is much larger than 1/t, then the binomial distribution is very close to the Poisson distribution, which returns k with probability e^{-t} t^k/k!.

Here is some pseudocode

r=e^t * RandomReal[0,1]; 
s=1;
k=0;
While[s<r, 
  (k++; s=s+t^k/k!;)]
Return k;

If t is really small, it will be pretty hard to tell the difference between this and a routine that just returns 0 with probability 1-t and t the rest of the time. For example, if t=0.001 and n is large, then the probabilities of various values of k are

k=0  0.9990005
k=1  0.0009990
k=2  0.0000005
k>2  1.7 * 10^{-10}
0

Where np is very small, only the very smallest values of n are at all likely. You could work out the probabilities of those and then use http://en.wikipedia.org/wiki/Alias_method. If you are feeling extra scrupulous, you could work out the probability that some value above those you are prepared to deal with occurs instead, and divert to a special case method with this probability, such as generating a second alias table to cope with the most likely N values above those your first alias method coped with.

mcdowella
  • 19,301
  • 2
  • 19
  • 25