Change your if statement to (s >= 1.0 || s == 0.0)
. Better yet, use a break
as seen in the following example for a SIMD Gaussian random number generating returning a complex pair (u,v). This uses the Mersenne twister random number generator dsfmt()
. If you only want a single, real, random-number, return only u
and save the v
for the next pass.
inline static void randn(double *u, double *v)
{
double s, x, y; // SIMD Marsaglia polar version for complex u and v
while (1){
x = dsfmt_genrand_close_open(&dsfmt) - 1.;
y = dsfmt_genrand_close_open(&dsfmt) - 1.;
s = x*x + y*y;
if (s < 1) break;
}
s = sqrt(-2.0*log(s)/s);
*u = x*s; *v = y*s;
return;
}
This algorithm is surprisingly fast. Execution times for computing two random numbers (u,v) for four different Gaussian random number generators are:
Times for delivering two Gaussian numbers (u + iv)
i7-2600K @ 4GHz, gcc -Wall -Ofast -msse2 ..
gsl_ziggurat = 20.3 (ns)
Box-Muller = 78.8 (ns)
Box-Muller with fast_sin fast_cos = 28.1 (ns)
SIMD Marsaglia polar = 35.0 (ns)
The fast_sin and fast_cos polynomial routines of Charles K. Garrett speed up the Box-Muller computation by a factor 2.9 using a nested polynomial implementation of cos() and sin(). The SIMD Box Muller and polar algorithms are certainly competitive. Also they can be parallelized easily. Using gcc -Ofast -S, the assembly code dump shows that the square root is the SIMD SSE2: sqrt --> sqrtsd %xmm0, %xmm0
Comment: it is really hard and frustrating to get accurate timings with gcc5, but I think these are ok: as of 2/3/2016: DLW
[1] Related link: c malloc array pointer return in cython
[2] A comparison of algorithms, but not necessarily for SIMD versions: http://www.doc.ic.ac.uk/~wl/papers/07/csur07dt.pdf
[3] Charles K. Garrett: http://krisgarrett.net/papers/l2approx.pdf