What is the most efficient and portable way to generate Gaussian random numbers in cython?

Question

I am writing a cython application where I need to generate a Gaussian random variable on-the-fly in a tight nested loop. I would like to do this without introducing any extra dependencies, e.g., on GSL.

For a minimal version of the way I am currently able to do this with uniform random numbers on-the-fly:

from libc.stdlib cimport rand, RAND_MAX
import numpy as np

cdef double random_uniform():
    cdef double r = rand()
    return r/RAND_MAX

def my_function(int n):
    cdef int i
    cdef double[:] result = np.zeros(n, dtype='f8', order='C')
    for i in range(n):
        result[i] = random_uniform()
    return result

The above code is functionally equivalent to numpy.random.rand(n), and can be compiled with the following minimal setup file:

from distutils.core import setup
from Cython.Build import cythonize
import numpy as np

setup(ext_modules=cythonize("example.pyx"), include_dirs=[np.get_include()])

# compile instructions:
# python setup.py build_ext --inplace

To answer this question, what I am looking for is the same kind of minimal solution for the functional equivalent of np.random.randn(n), again ideally with any dependency directly cimported from libc.stdlib for reasons of portability.

There is an example implementation on the Wikipedia entry for the Box-Muller algorithm, but I have had trouble implementing it due to the way that the constant epsilon is defined.

See also http://stackoverflow.com/questions/40976880/canonical-way-to-generate-random-numbers-in-cython, http://stackoverflow.com/questions/16138090/correct-way-to-generate-random-numbers-in-cython, http://stackoverflow.com/questions/27824959/thread-safe-random-number-generation-with-cython/. I'm pretty sure this is a duplicate of one or more of these. Probably the first one. — DavidW, Mar 13 '17 at 16:49
Thanks for the cross-references, @DavidW. These links all pertain to uniform randoms, not Gaussian randoms. — aph, Mar 13 '17 at 18:54

senderle · Accepted Answer · 2021-04-11T21:53:23.583

I created a function that generates gaussian-distributed random numbers based on the polar version of the Box-Muller transformation, as described by the pseudocode here. (I originally found this on the page archived here.)

This method generates two gaussian-distributed random numbers at a time. That means to get full cython speed, we need to figure out a way to pass two numbers around without turning them into Python objects. The most straightforward way to do so (that I can think of) is to pass the buffer in for direct manipulation by the generator. That's what my_gaussian_fast does, and it beats numpy by a modest margin.

from libc.stdlib cimport rand, RAND_MAX
from libc.math cimport log, sqrt
import numpy as np
import cython

cdef double random_uniform():
    cdef double r = rand()
    return r / RAND_MAX

cdef double random_gaussian():
    cdef double x1, x2, w

    w = 2.0
    while (w >= 1.0):
        x1 = 2.0 * random_uniform() - 1.0
        x2 = 2.0 * random_uniform() - 1.0
        w = x1 * x1 + x2 * x2

    w = ((-2.0 * log(w)) / w) ** 0.5
    return x1 * w

@cython.boundscheck(False)
cdef void assign_random_gaussian_pair(double[:] out, int assign_ix):
    cdef double x1, x2, w

    w = 2.0
    while (w >= 1.0):
        x1 = 2.0 * random_uniform() - 1.0
        x2 = 2.0 * random_uniform() - 1.0
        w = x1 * x1 + x2 * x2

    w = sqrt((-2.0 * log(w)) / w)
    out[assign_ix] = x1 * w
    out[assign_ix + 1] = x2 * w

@cython.boundscheck(False)
def my_uniform(int n):
    cdef int i
    cdef double[:] result = np.zeros(n, dtype='f8', order='C')
    for i in range(n):
        result[i] = random_uniform()
    return result

@cython.boundscheck(False)
def my_gaussian(int n):
    cdef int i
    cdef double[:] result = np.zeros(n, dtype='f8', order='C')
    for i in range(n):
        result[i] = random_gaussian()
    return result

@cython.boundscheck(False)
def my_gaussian_fast(int n):
    cdef int i
    cdef double[:] result = np.zeros(n, dtype='f8', order='C')
    for i in range(n // 2):  # Int division ensures trailing index if n is odd.
        assign_random_gaussian_pair(result, i * 2)
    if n % 2 == 1:
        result[n - 1] = random_gaussian()

    return result

Tests. Here's a uniform benchmark:

In [3]: %timeit numpy.random.uniform(size=10000)
10000 loops, best of 3: 130 µs per loop

In [4]: %timeit numpy.array(example.my_uniform(10000))
10000 loops, best of 3: 85.4 µs per loop

So this is definitely faster than numpy for ordinary random numbers. And if we're smart about it, it's faster for gaussian random numbers too:

In [5]: %timeit numpy.random.normal(size=10000)
1000 loops, best of 3: 393 µs per loop

In [6]: %timeit numpy.array(example.my_gaussian(10000))
1000 loops, best of 3: 542 µs per loop

In [7]: %timeit numpy.array(example.my_gaussian_fast(10000))
1000 loops, best of 3: 266 µs per loop

As confirmed by Robert Kern, numpy uses both values generated. my_gaussian throws one away; my_gaussian_fast uses both and stores them quickly. (See this answer's history for a naive my_gaussian_pair that tries to return the pair in a slow way.)

exactly what I was looking for! I played around with your implementation trying to optimize things further, e.g., by adding the usual optimization decorators such as @cython.boundscheck(False), adding `double` declarations for `a` and `b`, using `libc.math.sqrt`, etc. These were all incremental improvements of ~5% at most, so I think your code is pretty optimal. And this is absolutely fast enough for my purposes, many thanks! — aph, Mar 13 '17 at 17:16
Declaring a, b did not do much, actually. I *think* the reason, and also the main reason for the speed difference with numpy, is that the `random_gaussian_pair` function returns `(a, b)`, which goes to a python tuple before filling the `result` array. — aph, Mar 13 '17 at 17:46

score 1 · Answer 2 · edited May 23 '17 at 10:30

You say that you have issues implementing the Box-Muller transform because of the way they define espilon:

const double epsilon = std::numeric_limits<double>::min();

According to here this is the C equivalent:

const double lowest_double = -DBL_MAX;

So to get the correct imports in Cython:

from libc.float import DBL_MAX #it should still be portable btw.

Now the issue with epsilon should be resolved.

What is the most efficient and portable way to generate Gaussian random numbers in cython?

2 Answers2

Linked