In previous answers I have shown how to use the C standard math library to efficiently and accurately compute the PDF of the standard normal distribution, normpdf()
, the CDF of the standard normal distribution, normcdf()
, and the scaled complementary error function erfcx()
. Based of these three implementations, one could easily code the computation of the Mills ratio in straightforward manner in one of the following two ways:
double my_mills_ratio_1 (double a)
{
return my_normcdf (-a) / my_normpdf (a);
}
double my_mills_ratio_2 (double a)
{
const double SQRT_HALF_HI = 0x1.6a09e667f3bccp-01; // 1/sqrt(2), msbs;
const double SQRT_HALF_LO = 0x1.21165f626cdd5p-54; // 1/sqrt(2), lsbs;
const double SQRT_PIO2_HI = 0x1.40d931ff62705p+00; // sqrt(pi/2), msbs;
const double SQRT_PIO2_LO = 0x1.2caf9483f5ce4p-53; // sqrt(pi/2), lsbs;
double r;
a = fma (SQRT_HALF_HI, a, SQRT_HALF_LO * a);
r = my_erfcx (a);
return fma (SQRT_PIO2_HI, r, SQRT_PIO2_LO * r);
}
However, both of these approaches are numerically flawed. For mills_ratio_1()
, both the PDF term and CDF term vanish rapidly in the positive half-plane as the magnitude of the argument increases. In IEEE-754 double precision both become zero around a
= 38, leading to a NaN result due to a division of zero by zero. With regard to my_mills_ratio_2()
, the exponential growth in the negative half-plane leads to error magnification and therefore large ulp errors. One way to fix this is to simply combine the well-behaved portions of each of the two approximations:
double my_mills_ratio_3 (double a)
{
return (a < 0) ? my_mills_ratio_1 (a) : my_mills_ratio_2 (a);
}
This works reasonably well. Using the Intel compiler version 13.1.3.198 to build the code I presented in my previous answers, using 4 billion test vectors, a maximum error of 2.79346 ulps is observed in the positive half-plane, while a maximum error of 6.81248 ulps is observed in the negative half-plane. The somewhat larger errors in the negative half-plane occur for large results close to overflow, because at that point the values of the PDF are so small that they are represented as subnormal double-precision numbers with reduced accuracy.
One alternative solution is to address the error magnification issues affecting my_mills_ratio_2()
in the negative half-plane. One can do so by computing the argument to erfcx()
to better than double precision, and using the low-order bits of this argument for linear interpolation of the erfcx()
result.
For this, one also needs the slope of erfcx(x), which is erfcx'(x) = 2x erfcx(x) - 2/√π. Availability of the FMA (fused multiply-add) operation via the C standard math function fma()
provides for the efficient implementation of this quasi double-double computation. The risk of overflow in the magnitude of the slope during intermediate computations can be avoided by local rescaling.
The resulting implementation has an error of less than 4 ulps across the entire input domain:
/* Compute Mills ratio of the standard normal distribution:
*
* M(x) = normcdf(-x)/normpdf(x) = sqrt(pi/2) * erfcx(x/sqrt(2))
*
* maximum ulp error in positive half-plane: 2.79346
* maximum ulp error in negative half-plane: 3.90753
*/
double my_mills_ratio (double a)
{
double s, t, r, h, l;
const double SQRT_HALF_HI = 0x1.6a09e667f3bccp-01; // 1/sqrt(2), msbs
const double SQRT_HALF_LO = 0x1.21165f626cdd5p-54; // 1/sqrt(2), lsbs
const double SQRT_PIO2_HI = 0x1.40d931ff62705p+00; // sqrt(pi/2), msbs
const double SQRT_PIO2_LO = 0x1.2caf9483f5ce4p-53; // sqrt(pi/2), lsbs
const double TWO_RSQRT_PI = 0x1.20dd750429b6dp+00; // 2/sqrt(pi)
const double MAX_IEEE_DBL = 0x1.fffffffffffffp+1023;
const double SCALE_DOWN = 0.03125; // prevent ovrfl in intermed. computation
const double SCALE_UP = 1.0 / SCALE_DOWN;
// Compute argument a/sqrt(2) as a head-tail pair of doubles h:l
h = fma (SQRT_HALF_HI, a, SQRT_HALF_LO * a);
l = fma (-SQRT_HALF_LO, a, fma (-SQRT_HALF_HI, a, h));
// Compute scaled complementary error function for argument "head"
t = my_erfcx (h);
// Enhance accuracy if in negative half-plane, if result has not overflowed
if ((a < -1.0) && (t <= MAX_IEEE_DBL)) {
// Compute slope: erfcx'(x) = 2x * erfcx(x) - 2/sqrt(pi)
s = fma (h, t * SCALE_DOWN, -TWO_RSQRT_PI * SCALE_DOWN); // slope
// Linearly interpolate result based on derivative and argument "tail"
t = fma (s, -2.0 * SCALE_UP * l, t);
}
// Scale by sqrt(pi/2) for final result
r = fma (SQRT_PIO2_HI, t, SQRT_PIO2_LO * t);
return r;
}
The single-precision implementation looks almost identical, except for the constants involved:
/* Compute Mills ratio of the standard normal distribution:
*
* M(x) = normcdf(-x)/normpdf(x) = sqrt(pi/2) * erfcx(x/sqrt(2))
*
* maximum ulp error in positive half-plane: 2.41987
* maximum ulp error in negative half-plane: 3.39521
*/
float my_mills_ratio_f (float a)
{
float h, l, r, s, t;
const float SQRT_HALF_HI = 0x1.6a09e6p-01f; // sqrt(1/2), msbs
const float SQRT_HALF_LO = 0x1.9fcef4p-27f; // sqrt(1/2), lsbs
const float SQRT_PIO2_HI = 0x1.40d930p+00f; // sqrt(pi/2), msbs
const float SQRT_PIO2_LO = 0x1.ff6270p-24f; // sqrt(pi/2), lsbs
const float TWO_RSQRT_PI = 0x1.20dd76p+00f; // 2/sqrt(pi)
const float MAX_IEEE_FLT = 0x1.fffffep+127f;
const float SCALE_DOWN = 0.0625f; // prevent ovrfl in intermed. computation
const float SCALE_UP = 1.0f / SCALE_DOWN;
// Compute argument a/sqrt(2) as a head-tail pair of floats h:l
h = fmaf (SQRT_HALF_HI, a, SQRT_HALF_LO * a);
l = fmaf (-SQRT_HALF_LO, a, fmaf (-SQRT_HALF_HI, a, h));
// Compute scaled complementary error function for argument "head"
t = my_erfcxf (h);
// Enhance accuracy if in negative half-plane, if result has not overflowed
if ((a < -1.0f) && (t <= MAX_IEEE_FLT)) {
// Compute slope: erfcx'(x) = 2x * erfcx(x) - 2/sqrt(pi)
s = fmaf (h, t * SCALE_DOWN, -TWO_RSQRT_PI * SCALE_DOWN);
// Linearly interpolate result based on derivative and argument "tail"
t = fmaf (s, -2.0f * SCALE_UP * l, t);
}
// Scale by sqrt(pi/2) for final result
r = fmaf (SQRT_PIO2_HI, t, SQRT_PIO2_LO * t);
return r;
}