4

The standard C math library does not provide a function to compute the CDF of the standard normal distribution, normcdf(). It does, however, offer closely related functions: the error function, erf() and the complementary error function, erfc(). The fastest way to compute the CDF is often via the error function, using the predefined constant M_SQRT1_2 to represent √½:

double normcdf (double a) 
{
    return 0.5 + 0.5 * erf (M_SQRT1_2 * a);
}

Clearly, this suffers from massive subtractive cancellation in the negative half-plane and is not suitable for the majority of applications. Since the cancellation issue is easily avoided by use of erfc(), which however is typically of somewhat lower performance than erf(), the most frequently recommended computation is:

double normcdf (double a) 
{
    return 0.5 * erfc (-M_SQRT1_2 * a);
}

Some testing shows that the maximum ulp error incurred in the negative half-plane is still rather large however. Using a double-precision implementation of erfc() accurate to 0.51 ulps, one can observe errors of up to 1705.44 ulps in normcdf(). The issue here is that the computational error in the input to erfc() is magnified by the exponential scaling that is inherent to erfc() (See this answer for an explanation of error magnification caused by exponentiation).

The following paper shows how one can achieve (nearly) correctly rounded, products when multiplying floating-point operands with arbitrary-precision constants such as √½:

Nicolas Brisebarre and Jean-Michel Muller, "Correctly rounded multiplication by arbitrary precision constants", IEEE Transactions on Computers, Vol. 57, No. 2, February 2008, pp. 165-174

The method advocated by the paper relies on the fused multiply-add operation, which is available on recent implementations of all common processor architectures, and is exposed in C via the standard math function fma(). This leads to the following version:

double normcdf (double a) 
{
    double SQRT_HALF_HI =  0x1.6a09e667f3bcd0p-01; //  7.0710678118654757e-01
    double SQRT_HALF_LO = -0x1.bdd3413b264560p-55; // -4.8336466567264567e-17

    return 0.5 * erfc (fma (-SQRT_HALF_HI, a, -SQRT_HALF_LO * a));
}

Tests show that this cuts the maximum error roughly in half compared to the previous version. Using the same highly-accurate erfc() implementation as before, the maximum observed error is 842.71 ulps. This is still far away from the usual goal of providing basic mathematical functions with an error of at most a few ulps.

Is there an efficient method that allows for the accurate computation of normcdf(), and that uses only functions available in the standard C math library?

Community
  • 1
  • 1
njuffa
  • 23,970
  • 4
  • 78
  • 130
  • Are you on a system where `long double` provides more precision than `double`? If so, does using `erfl()` and `erfcl()` provide any assistance? – Jonathan Leffler Jun 18 '16 at 23:16
  • @JonathanLeffler My currently used platforms either do not support `long double` or map `long double` to `double`. Otherwise, with `long double` mapped to 80-bit extended precision, double-double, or quadruple precision, my expectation would be that the simple formula based on `erfcl` would deliver results accurate to double precision, but I don't have a way of demonstrating that right now. On the other hand, even assuming computation via `erfl()` maps to full IEEE-754 quadruple precision, massive cancellation will lead to inaccurate results in the negative half-plane. – njuffa Jun 18 '16 at 23:24

1 Answers1

2

One way around the accuracy limitations of the approaches outlined in the question is the limited use of double-double computation. This involves the computation of -sqrt (0.5) * a as a pair of double variables h and l in head/tail fashion. The high-order portion h of the product is passed to erfc(), while the low-order portion l is then used to interpolate the erfc() result, based on the local slope of the complementary error function at h.

The derivative of erfc(x) is -2 * exp (-x * x) / √π. However, one would like to avoid the fairly expensive computation of exp(-x * x). It is known that for x > 0, erfc(x) ~= 2 * exp (-x * x) / (√π * (x + sqrt (x* x + 4/π)). Therefore, asymptotically, erfc'(x) ~= -2 * x * erfc(x), and it follows that for |l| ≪|h|, erfc (h+l) ~= erfc (h) - 2 * h * l * erfc(h). The negation of the latter term can easily be pulled into the computation of l. One arrives at the following implementation for double precision (using IEEE-754 binary64):

double my_normcdf (double a)
{
    double h, l, r;
    const double SQRT_HALF_HI =  0x1.6a09e667f3bcd0p-01; //  7.0710678118654757e-01
    const double SQRT_HALF_LO = -0x1.bdd3413b264560p-55; // -4.8336466567264567e-17

    /* clamp input as normcdf(x) is either 0 or 1 asymptotically */
    if (fabs (a) > 38.625) a = (a < 0.0) ? -38.625 : 38.625;

    h = fma (-SQRT_HALF_HI, a, -SQRT_HALF_LO * a);
    l = fma (SQRT_HALF_LO, a, fma (SQRT_HALF_HI, a, h));
    r = erfc (h);
    if (h > 0.0) r = fma (2.0 * h * l, r, r);
    return 0.5 * r;
}

The maximum observed error, using the same erfc() implementation as used before, is 1.96 ulps. The corresponding single-precision implementation (using IEEE-754 binary32) is:

float my_normcdff (float a)
{
    float h, l, r;
    const float SQRT_HALF_HI = 0x1.6a09e6p-01f; // 7.07106769e-1
    const float SQRT_HALF_LO = 0x1.9fcef4p-27f; // 1.21016175e-8

    /* clamp input as normcdf(x) is either 0 or 1 asymptotically */
    if (fabsf (a) > 14.171875f) a = (a < 0.0f) ? -14.171875f : 14.171875f;

    h = fmaf (-SQRT_HALF_HI, a, -SQRT_HALF_LO * a);
    l = fmaf (SQRT_HALF_LO, a, fmaf (SQRT_HALF_HI, a, h));
    r = erfcf (h);
    if (h > 0.0f) r = fmaf (2.0f * h * l, r, r);
    return 0.5f * r;
}
njuffa
  • 23,970
  • 4
  • 78
  • 130