5

The relevant question is: Algorithm to generate Poisson and binomial random numbers?

I just take her description for the Binomial random number:

For example, consider binomial random numbers. A binomial random number is the number of heads in N tosses of a coin with probability p of a heads on any single toss. If you generate N uniform random numbers on the interval (0,1) and count the number less than p, then the count is a binomial random number with parameters N and p.

There is a trivial solution in Algorithm to generate Poisson and binomial random numbers? through using iterations:

public static int getBinomial(int n, double p) {
  int x = 0;
  for(int i = 0; i < n; i++) {
    if(Math.random() < p)
      x++;
  }
  return x;
}

However, my purpose of pursing a binomial random number generator is just to avoid the inefficient loops (i from 0 to n). My n could be very large. And p is often very small.

A toy example of my case could be: n=1*10^6, p=1*10^(-7).

The n could range from 1*10^3 to 1*10^10.

Community
  • 1
  • 1
Changwang Zhang
  • 2,467
  • 7
  • 38
  • 64
  • Simple remark: I've toyed around a bit with your problem, looking in particular at the difference between getBinomial(n, p) and the average n*p. The average difference when p=.5 goes up by a factor of roughly 3 every time n goes up by a factor of 10. In other words, when n is several millions, the result is going to be very very close to the average. Depending on your use of this, you could easily make a decent approximation of the result when n is large. If the context even slightly scientific however, this observation is useless. – schmop May 09 '14 at 10:58
  • Unfortunately p is very very very small in my case could be 0.1/(several millions). And the context is indeed scientific... But thank you for help as well. @schmop – Changwang Zhang May 09 '14 at 11:23
  • 2
    In the case of very small p and very large N, one can approximate with the Poisson distribution. I wonder just how precise you need to be? As p goes down and n goes up the distributions are closer and closer. – Ordous May 09 '14 at 13:30
  • For larger p and large N, you can approximate with a normal distribution. – David Eisenstat May 09 '14 at 16:35
  • @Leo I can only second Ordous: Have you considered approximating with Poisson distribution? – Ali May 10 '14 at 12:43

2 Answers2

8

If you have small p values, you'll like this one better than the naive implementation you cited. It still loops, but the expected number of iterations is O(np) so it's pretty fast for small p values. If you're working with large p values, replace p with q = 1-p and subtract the return value from n. Clearly, it will be at its worst when p = q = 0.5.

public static int getBinomial(int n, double p) {
   double log_q = Math.log(1.0 - p);
   int x = 0;
   double sum = 0;
   for(;;) {
      sum += Math.log(Math.random()) / (n - x);
      if(sum < log_q) {
         return x;
      }
      x++;
   }
}

The implementation is a variant of Luc Devroye's "Second Waiting Time Method" on page 522 of his text "Non-Uniform Random Variate Generation."

There are faster methods based on acceptance/rejection techniques, but they are substantially more complex to implement.

pjs
  • 18,696
  • 4
  • 27
  • 56
  • 1
    I have test the Acceptance Rejection/Inversion method in colt lib: http://acs.lbl.gov/software/colt/. The speed is good. But the generated distribution of binomial numbers only agrees with the native implementation (I cited) when n*p is not that small. In my case, n*p is often 1*10^(-3) or even smaller. In the native implementation you can get binomial number 1 after many runs, but the Rejection/Inversion method in colt was never able to return 1 (when n*p=1*10^(-3)). I will test whether the program you provided can tackle this. Why Rejection/Inversion method fails in my case? – Changwang Zhang May 10 '14 at 08:28
  • I don't know why that would be failing. However, given how rare the occurrences are have you considered that a Poisson distribution might be a better model for you? – pjs May 10 '14 at 14:28
  • I think that this is OK because `Math.log(Math.random())` is always nonzero negative and nonzero negative/zero = infinity in IEEE arithmetic, but explicitly testing `x < n` before entering the loop would make me less nervous. – David Eisenstat May 10 '14 at 14:53
  • @DavidEisenstat Infinity is explicitly stated to be allowable in Devroye's comments about the algorithm, and Java has no problem with cranking out +/-Infinity and using them correctly for comparisons. – pjs May 10 '14 at 14:57
  • If n*p is in the range 1E-3, does the Acceptance/Rejection/Inversion method _always_ return 0 or does it return a handful of non-zero's if you run it, say 10K or 100K times? – pjs May 10 '14 at 15:19
  • As I said, it's fine in Java, but buried treasure if transliterated into any number of other languages. – David Eisenstat May 10 '14 at 15:51
  • @pjs I have tested. Your code is great and works even when n*p=1E-3. When n*p=1E-3, the Acceptance/Rejection/Inversion method always return 0. I have ran 100k times, native implementation and your code has around 9.4E-4 probability for returning 1. The Acceptance/Rejection/Inversion always return 0. This means the the Acceptance/Rejection/Inversion method is not theoretically sound, I suppose? – Changwang Zhang May 10 '14 at 19:52
  • @pjs When n*p=1E-2, the Acceptance/Rejection/Inversion still works and agree with native and your code. n*p=1E-3 seems to be threshold for the Acceptance/Rejection/Inversion to stop working. Really weird. – Changwang Zhang May 10 '14 at 19:56
  • 2
    I know one of the co-authors for the Acceptance/Rejection/Inversion method, and have the utmost respect for his knowledge in this area. I have no doubts that the method is mathematically sound. You're probably running up against numerical issues. Lot's of things are mathematically correct but only work computationally within certain bounds. Meanwhile, I'm glad to hear that Devroye's method works for you. – pjs May 10 '14 at 20:22
  • @pjs - I've implemented this function in Julia and would like to ensure I have the proper attribution. Could you please let me know 1) whether this is ok, and 2) how you'd like the attribution to read? Thanks. – sbromberger Apr 12 '17 at 00:08
  • 2
    @sbromberger Give credit to Luc Devroye, his book was a labor of love and a real contribution. The book is now out of print, so Devroye has made it available for free as a PDF online. You can get it here: http://www.eirene.de/Devroye.pdf – pjs Apr 12 '17 at 00:33
1

I could imagine one way to speed it up by a constant factor (e.g. 4).

After 4 throws you will toss a head 0,1,2,3 or 4.

The probabilities for it are something like [0.6561, 0.2916, 0.0486, 0.0036, 0.0001].

Now you can generate one number random number and simulate 4 original throws. If that's not clear how I can elaborate a little more.

This way after some original pre-calculation you can speedup the process almost 4 times. The only requirement for it to be precise is that the granularity of your random generator is at least p^4.

Kuba
  • 2,986
  • 2
  • 26
  • 32
  • A generalization of this method using random bit streams instead of doubles is pretty much the best you can do. However it requires a fair bit of precomputation on the research part to find just how much entropy one needs and how to translate results. – Ordous May 09 '14 at 13:26