Performance for drawing numbers from Poisson distribution with low mean

Question

In order to draw random number from a Poisson distribution in C++, it is generally advised to use

RNG_type rng;
std::poisson_distribution<size_t> d(1e-6);
auto r = d(rng);

At each call of the std::poisson_distribution object, an entire sequence of random bits is consumed (e.g. 32 bits with std::mt19937, 64 bits for std::mt19937_64). It strikes me that with such low mean (mean = 1e-6), the vast majority of times, only a few bits are enough to determine that the value to return is 0. The other bits could then be cached for later use.

Assuming that a sequence of bits set to true is associated to a high returned value from the Poisson distribution, when using a mean of 1e-6, any sequence not starting with 19 trues necessarily returns a zero! Indeed,

1 - 1/2^19 < P(0, 1e-6) < 1 - 1/2^20

, where P(n, r) denotes the probability of drawing n from a Poisson distribution with mean r. An algorithm that does not waste bits would use one bit half of the time, two bits a quarter of the times, three bits an eighth of the times, ....

Is there an algorithm out there that can improve performance by consuming as few bits as possible when drawing Poisson numbers? Is there another way to improve performance compared to std::poisson_distribution when we consider a low mean?

In response to @Jarod42's comment who said

Wonder if using fewer bits don't break equiprobability...

I don't think it would break equiprobability. In a vague attempt to test it, I consider the same question with a simple bernoulli distribution. I am sampling true with a probability 1/2^4 and sampling false with a probability 1 - 1/2^4. The function drawWithoutWastingBits stops as soon as it sees a true in the cache and the function drawWastingBits consumes 4 bits whatever these bits are.

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
#include <random>

bool drawWithoutWastingBits(std::vector<bool>& cache, size_t& cache_index)
{
    /* 
        Get a true with probability 1/2^4 (=1/16=0.0625) and a false otherwise
    */

    size_t nbTrues = 0;
    while (cache[cache_index])
    {
        ++nbTrues;
        ++cache_index;
        if (nbTrues == 4)
        {
            return true;
        }
    }
    ++cache_index;
    return false;
}


bool drawWastingBits(std::vector<bool>& cache, size_t& cache_index)
{
    /* 
        Get a true with probability 1/2^4 (=1/16=0.0625) and a false otherwise
    */

    bool isAnyTrue = false;
    for (size_t i = 0 ; i < 4; ++i)
    {
        if (cache[cache_index])
        {
            isAnyTrue = true;
        }
        ++cache_index;
    }
    return !isAnyTrue;
}

int main()
{
    /*
        Just cache a lot of bits in advance in `cache`. The same sequence of bits will be used by both function.
        I am just caching way enough bits to make sure they don't run out of bits below
        I made sure to have the same number of zeros and ones so that any deviation is caused by the methodology and not by the RNG
    */

    // Produce cache
    std::vector<bool> cache;
    size_t nbBitsToCache = 1e7;
    cache.reserve(nbBitsToCache);
    for (size_t i = 0 ; i < nbBitsToCache/2 ; ++i)
    {
        cache.push_back(false);
        cache.push_back(true);
    }
    // Shuffle cache
    {
        std::mt19937 mt(std::random_device{}());
        std::shuffle(cache.begin(), cache.end(), mt);
    }


    // Draw without wasting bits
    {
        size_t nbDraws = 1e6;
        size_t cache_index = 0;
        std::pair<size_t, size_t> outcomes = {0,0};
        for (size_t r = 0 ; r < nbDraws ; ++r)
        {
            drawWithoutWastingBits(cache, cache_index) ? ++outcomes.first : ++outcomes.second;
            assert(cache_index <= cache.size());
        }   

        assert(outcomes.first + outcomes.second == nbDraws);
        std::cout << "Draw Without Wasting Bits: prob true = " << (double)outcomes.first / nbDraws << "\n";
    }


    // Draw wasting bits
    {
        size_t nbDraws = 1e6;
        size_t cache_index = 0;
        std::pair<size_t, size_t> outcomes = {0,0};
        for (size_t r = 0 ; r < nbDraws ; ++r)
        {
            drawWastingBits(cache, cache_index) ? ++outcomes.first : ++outcomes.second;
            assert(cache_index <= cache.size());
        }   

        assert(outcomes.first + outcomes.second == nbDraws);
        std::cout << "Draw Wit Wasting Bits: prob true = " << (double)outcomes.first / nbDraws << "\n";
    }
}

Possible output

Draw Without Wasting Bits: prob true = 0.062832
Draw Wit Wasting Bits: prob true = 0.062363

@rustyx This post suggests that Mersenne Twister outperforms LCG (and so does this [Talk wikipedia page](https://en.wikipedia.org/wiki/Talk%3ALinear_congruential_generator)). I am using `std::mt19937_64` (I am already caching bits for sampling equiprobable boolean values) and have not really tried any LCG or xorshift or any other potentially faster RNG. In all case, while the production of random numbers is slow, the `std::poisson_disribution` is in itself pretty slow too. I am hoping that this can also improve once it know the mean is very low. — Remi.b, May 05 '20 at 15:00
Is pointless to talk about `std::poisson_disribution` in a general sense, as it's implementation-defined. For all we know, there can be an implementation that does it the way you propose. I would check how it's implemented in different toolchains (also boost has one). — rustyx, May 05 '20 at 15:43

Peter O. · Answer 1 · 2021-10-16T17:57:07.213

Devroye's Non-Uniform Random Variate Generation, pp. 505 and 86, mentions an inversion by sequential search algorithm.

Based on that algorithm, if you know the mean is considerably less than 1, then if you generate a uniform random variate u in [0, 1], the Poisson variable will be 0 if u <= exp(-mean), and greater than 0 otherwise.

If the mean is low and you can tolerate an approximate distribution, then you can use the following approach (see Appendix A of "The Discrete Gaussian for Differential Privacy"):

Express mean in the form of a rational number, in the form numer/denom. For example, if mean is a fixed value, then numer and denom can be precalculated accordingly, such as at compile time.
Randomly generate a Bernoulli(numer / denom) number (generate 1 with probability numer / denom or 0 otherwise). If 1 was generated this way, repeat this step with Bernoulli(numer / (denom * 2)), Bernoulli(numer / (denom * 3)), and so on until 0 is generated this way. Generate these numbers using an algorithm that minimizes waste of bits, such as the one mentioned in Appendix B of Lumbroso's Fast Dice Roller paper (2013) or the "ZeroToOne" method modified from there and given in my section on Boolean conditions. See also this question.
If step 2 produced an even number of ones, the Poisson variable is exactly 0.
If step 2 produced an odd number of ones, the Poisson variable is greater than 0, and a "slower" algorithm is necessary that samples only Poisson variables greater than 0.

For example, say the mean is 1e-6 (1/1000000), Generate a Bernoulli(1/1000000) number, then Bernoulli(1/2000000), etc. Until you generate 0 this way. If an even number of ones were generated, then the Poisson variable is exactly 0. Otherwise, the Poisson variable is 1 or greater and a "slower" algorithm is necessary.

One example is the algorithm below, which is based on the one from pages 505 and 86, but only samples Poisson variables 1 or greater:

METHOD Poisson1OrGreater(mean)
 sum=Math.exp(-mean)
 prod=sum
 u=RNDRANGE(sum, 1)
 i=0
 while i==0 or u>sum
   prod*=mean/(i+1)
   sum+=prod
   i=i+1
 end
 return i
END METHOD

This method, though, is not very robust, especially since it uses numbers close to 1 (where the floating-point space is more sparse) rather than numbers close to 0.

Note that the sum of n independent Poisson(mean) random variates is Poisson(mean*n) distributed (p. 501). Thus, the discussion above in this answer applies to a sum of n Poisson random variates as long as n times their mean remains small. For example, to generate a sum of 1000 Poisson random variates with a mean of 1e-6, simply generate a single Poisson random variate with a mean of 0.001. This will save considerably on calls to the pseudorandom number generator.

There is another way to generate Poisson variates with low mean (1 or less). It is described by Duchon and Duvignau in "Preserving the number of cycles of length k in a growing uniform permutation", Electronic Journal of Combinatorics 23(4), 2016.

First, generate a Poisson(1) random variate x = Poisson1() using the algorithm given below which uses only integer arithmetic (where RNDINT(a) generates a uniform random integer in [0, a]):

METHOD Poisson1()
  ret=1; a=1; b=0
  while true // until this method returns
    j=RNDINT(a)
    if j<a and j<b: return ret
    if j==a: ret=ret+1
    else
      ret=ret-1; b=a+1
    end
    a=a+1
  end
END METHOD

Now let mean be the desired mean. Flip a coin x times, where the coin shows heads with probability equal to mean. (In other words, generate a binomial(x, mean) random variate.) The number of heads is a Poisson(mean) random variate.

Veedrac · Answer 2 · 2020-05-07T16:23:29.823

-1

You can generate the time to next event with the equation (-ln U) / λ, where 0 < U ≤ 1 is a uniform random number, and λ is the event rate (aka. 1e-6).

https://preshing.com/20111007/how-to-generate-random-timings-for-a-poisson-process/

edited May 07 '20 at 16:23

answered May 05 '20 at 15:20

Veedrac

58,273
15
112
169

That's an exponential distribution, not a Poisson. The former is a continuous distribution, the latter is a counting distribution. They're related but different -- the Poisson distribution describes how many Poisson process events occur in a fixed length interval. – pjs May 06 '20 at 15:25
@pjs The exponential distrubution is a different question of *the same underlying process*. The exponential distribution just measures `1 - Pr(X = 0)`, since `λ^k e^-λ / k! = e^-λ` when `k = 0`. – Veedrac May 06 '20 at 22:40
@pjs Since in this case we're directly generating events over a continuous distribution, you just need to count how many fall in each interval. So if `(-ln U) / λ` returns 1248.6, and then 1.6, you know the next 1248 Poisson samples are zero, then there's a 1, then there's another 0, and then there's a 1+. This is hugely more efficient than the alternative approaches. – Veedrac May 06 '20 at 22:43
Yes, all of that's covered in Peter O's answer. And the code he posted from Devroye is more efficient, since it pre-calculates exp() of both sides to turn the sum of logs into a product of uniforms, thus avoiding a transcendental evaluation on every iteration. Regardless, your answer didn't actually answer the question as far as I can see. – pjs May 07 '20 at 01:39
@pjs No, Peter O's answer requires a new sample *every iteration*. My answer requires a new sample *every event*, which is a million times less frequent. – Veedrac May 07 '20 at 03:02
While his answer is academically interesting, I don't expect it to achieve much speedup. My approach should amortize down to a decrement and conditional jump per iteration. – Veedrac May 07 '20 at 03:08
If that's what you think, express it as an algorithm. So far all you've given is the well-known inversion algorithm to generate a single exponentially distributed inter-event time, with no explanation of how that maps to the Poisson distribution. – pjs May 07 '20 at 03:48
Something like `if (--next_evt) { poisson = 0; } else { t_f = floor(t); for (; floor(t) == t_f; t += (-ln U) / λ) { poisson++ }; next_evt = floor(t) - t_f; }`, though I've probably got some off-by-one errors. – Veedrac May 07 '20 at 04:11
@pjs It's just the algorithm I said before. If you know where all the events are, you know how many are in each interval through trivial counting. – Veedrac May 07 '20 at 04:16
"Something like" in a comment is not an answer. Further, what you've jotted is the inefficient version of the approach described in the first sentence of Peter's answer. As I pointed out above, this can be speeded up significantly (almost a factor of 2) by exponentiating both sides of the relationship, turning your sum of logs into a product of uniforms that gets compared to a single pre-calculated exponential since, as Peter established by questioning, the mean remains constant. – pjs May 07 '20 at 04:23
@pjs “Further, what you've jotted is the inefficient version of the approach described in the first sentence of Peter's answer.” → No it's not, it's completely different. Your ‘speed up’ is also irrelevant, since you're speeding up a codepath that is only hit once every million Poisson samples. It's important for Peter's answer, because Peter's answer is completely different. – Veedrac May 07 '20 at 04:25
Just to emphasize: His samples the RNG about once per Poisson sample. Mine samples the RNG about once per *million* samples. If this isn't clear enough idk, I tried. – Veedrac May 07 '20 at 04:33
"I tried." I feel the same way. You immediately go off the rails by saying lambda is a probability. It's not, it's a rate. Probabilities are measures and must be between 0 and 1. Rates are expected counts per time (or space) unit, and can be any positive number. You appear to misunderstand the relationship between the [exponential distribution](https://en.wikipedia.org/wiki/Exponential_distribution), a [Poisson processes](https://en.wikipedia.org/wiki/Poisson_point_process), and the [Poisson distribution](https://en.wikipedia.org/wiki/Poisson_distribution). These are related but distinct. – pjs May 07 '20 at 15:47
A Poisson process describes a set of independent events over time, the exponential distribution describes how much (continuous) time passes between events, and the Poisson distribution describes how many events occurred in a unit interval. Remi.b asked how to generate values from a Poisson distribution, you answered with how to generate an exponential, which is implicitly one step in generating a Poisson but is not a Poisson. In sum, you haven't answered the question. – pjs May 07 '20 at 15:48
@pjs Wrt ‘rate’ v. ‘probability’, I am not confused about the ground truth. That you don't like my colloquial use of the word is irrelevant to the algorithm's validity. Wrt your second comment, you're not telling me anything I don't obviously know. A sparse array and a dense array **represent the same thing**. This is my point! When your data is sparse, generate sparse representations! Even if you need to do dense math later on! – Veedrac May 07 '20 at 16:16
I didn't go into detail about how to count in my original post because counting is trivial and obvious, and depending on the end need might even be counterproductive. OP is clearly smart enough to know how to increment a counter himself. – Veedrac May 07 '20 at 16:18

Performance for drawing numbers from Poisson distribution with low mean

2 Answers2

Linked