I created a function that generates gaussian-distributed random numbers based on the polar version of the Box-Muller transformation, as described by the pseudocode here. (I originally found this on the page archived here.)
This method generates two gaussian-distributed random numbers at a time. That means to get full cython
speed, we need to figure out a way to pass two numbers around without turning them into Python objects. The most straightforward way to do so (that I can think of) is to pass the buffer in for direct manipulation by the generator. That's what my_gaussian_fast
does, and it beats numpy
by a modest margin.
from libc.stdlib cimport rand, RAND_MAX
from libc.math cimport log, sqrt
import numpy as np
import cython
cdef double random_uniform():
cdef double r = rand()
return r / RAND_MAX
cdef double random_gaussian():
cdef double x1, x2, w
w = 2.0
while (w >= 1.0):
x1 = 2.0 * random_uniform() - 1.0
x2 = 2.0 * random_uniform() - 1.0
w = x1 * x1 + x2 * x2
w = ((-2.0 * log(w)) / w) ** 0.5
return x1 * w
@cython.boundscheck(False)
cdef void assign_random_gaussian_pair(double[:] out, int assign_ix):
cdef double x1, x2, w
w = 2.0
while (w >= 1.0):
x1 = 2.0 * random_uniform() - 1.0
x2 = 2.0 * random_uniform() - 1.0
w = x1 * x1 + x2 * x2
w = sqrt((-2.0 * log(w)) / w)
out[assign_ix] = x1 * w
out[assign_ix + 1] = x2 * w
@cython.boundscheck(False)
def my_uniform(int n):
cdef int i
cdef double[:] result = np.zeros(n, dtype='f8', order='C')
for i in range(n):
result[i] = random_uniform()
return result
@cython.boundscheck(False)
def my_gaussian(int n):
cdef int i
cdef double[:] result = np.zeros(n, dtype='f8', order='C')
for i in range(n):
result[i] = random_gaussian()
return result
@cython.boundscheck(False)
def my_gaussian_fast(int n):
cdef int i
cdef double[:] result = np.zeros(n, dtype='f8', order='C')
for i in range(n // 2): # Int division ensures trailing index if n is odd.
assign_random_gaussian_pair(result, i * 2)
if n % 2 == 1:
result[n - 1] = random_gaussian()
return result
Tests. Here's a uniform benchmark:
In [3]: %timeit numpy.random.uniform(size=10000)
10000 loops, best of 3: 130 µs per loop
In [4]: %timeit numpy.array(example.my_uniform(10000))
10000 loops, best of 3: 85.4 µs per loop
So this is definitely faster than numpy
for ordinary random numbers. And if we're smart about it, it's faster for gaussian random numbers too:
In [5]: %timeit numpy.random.normal(size=10000)
1000 loops, best of 3: 393 µs per loop
In [6]: %timeit numpy.array(example.my_gaussian(10000))
1000 loops, best of 3: 542 µs per loop
In [7]: %timeit numpy.array(example.my_gaussian_fast(10000))
1000 loops, best of 3: 266 µs per loop
As confirmed by Robert Kern, numpy
uses both values generated. my_gaussian
throws one away; my_gaussian_fast
uses both and stores them quickly. (See this answer's history for a naive my_gaussian_pair
that tries to return the pair in a slow way.)