Numerically stable way to compute sqrt((b²*c²) / (1-c²)) for c in [-1, 1]

Question

For some real value b and c in [-1, 1], I need to compute

sqrt( (b²*c²) / (1-c²) ) = (|b|*|c|) / sqrt((1-c)*(1+c))

Catastrophic cancellation appears in the denominator when c approaches 1 or -1. The square root probably also does not help.

I was wondering if there is a clever trick I can apply here to avoid the difficult regions around c=1 and c=-1?

You can try https://herbie.uwplse.org/, though previous experiences with it have been hit-or-miss. But at least it can give you some suggestions. — alias, Jan 13 '21 at 17:10
For the denominator, computing as `sqrt(1-c) * sqrt(1+c)` should be fairly stable, numerically. If `c` is close to `1` then `1 - c` is exactly representable (by Sterbenz' lemma), and there's no catastrophic cancellation in `1 + c`. Similarly if `c` is close to `-1`, `1 + c` will be exactly representable and `1 - c` will be safe. — Mark Dickinson, Jan 13 '21 at 17:27
Small simplifications: concerns about `b` does not apply much here. Could use `|c|` --> `c` and only worry about `c` in [0..1] range. — chux - Reinstate Monica, Jan 13 '21 at 17:37
@alias: We've had this discussion before, I think, but for me, when I give Herbie an input of `sqrt(1- x*x)` and constraints `-1 < x and x < 1`, it suggests using `1 - (x * x) * 0.5` across the entire domain. Which is great for tiny `x`, but pretty bad for `x` close to `-1` or `1`. :-( I'd be interested to know if you're getting better results. — Mark Dickinson, Jan 13 '21 at 18:29
@jeff: Where is `c` coming from? If `c` itself is being computed as something like `1 + tiny`, then recasting the expression as a computation in terms of `tiny` may be the way to go. — Mark Dickinson, Jan 13 '21 at 18:31
@MarkDickinson Unfortunately no. I think Herbie (and tools like that) have a lot of promise, but the quality of the output has been questionable. I think providing feedback to developers of Herbie and related tools can benefit from these sorts of examples. — alias, Jan 13 '21 at 20:02
@MarkDickinson `c` is given as is, so expressing it in terms of a `tiny` is not applicable — jeff, Jan 14 '21 at 11:04
I’m voting to close this question because this is not related to programming (or, if it should be, lacks any attempts or the mentioning of a programming language) — Nico Haase, Jan 26 '21 at 14:41

Mark Dickinson · Accepted Answer · 2021-01-13T19:00:30.440

The most interesting part of this stability-wise is the denominator, sqrt(1 - c*c). For that, all you need to do is expand it as sqrt(1 - c) * sqrt(1 + c). I don't think this really qualifies as a "clever trick", but it's all that's needed.

For a typical binary floating-point format (for example IEEE 754 binary64, but other common formats should behave equally well, with the possible exception of unpleasant things like the double-double format), if c is close to 1 then 1 - c will be computed exactly, by Sterbenz' Lemma, while 1 + c doesn't have any stability issues. Similarly, if c is close to -1 then 1 + c will be computed exactly, and 1 - c will be computed accurately. The square root and multiplication operations will not introduce significant new error.

Here's a numerical demonstration, using Python on a machine with IEEE 754 binary64 floating-point and a correctly-rounded sqrt operation.

Let's take a c close to (but smaller than) 1:

>>> c = float.fromhex('0x1.ffffffff24190p-1')
>>> c
0.9999999999

We have to be a little bit careful here: note that the decimal value shown, 0.999999999, is an approximation to the exact value of c. The exact value is as shown in the construction from the hexadecimal string, or in fraction form, 562949953365017/562949953421312, and it's that exact value that we care about getting good results for.

The exact value of the expression sqrt(1 - c*c), rounded to 100 decimal places after the point, is:

0.0000141421362084401590649378320134409069878639187055610216016949959890888003204161068184484972504813

I computed this using Python's decimal module, and double-checked the result using Pari/GP. Here's the Python calculation:

>>> from decimal import Decimal, getcontext
>>> getcontext().prec = 1000
>>> good = (1 - Decimal(c) * Decimal(c)).sqrt().quantize(Decimal("1e-100"))
>>> print(good)
0.0000141421362084401590649378320134409069878639187055610216016949959890888003204161068184484972504813

If we compute naively, we get this result:

>>> from math import sqrt
>>> naive = sqrt(1 - c*c)
>>> naive
1.4142136208793713e-05

We can easily compute the approximate number of ulps error (with apologies for the amount of type conversion going on - float and Decimal instances can't be mixed directly in arithmetic operations):

>>> from math import ulp
>>> float((Decimal(naive) - good) / Decimal(ulp(float(good))))
208701.28298527992

So the naive result is out by a couple of hundred thousand ulps - roughly speaking, we've lost around 5 decimal places of accuracy.

Now let's try with the expanded version:

>>> better = sqrt(1 - c) * sqrt(1 + c)
>>> better
1.4142136208440158e-05
>>> float((Decimal(better) - good) / Decimal(ulp(float(good))))
-0.7170147200803595

So here we're accurate to better than 1 ulp error. Not perfectly correctly rounded, but the next best thing.

With some more work, it ought to be possible to state and prove an absolute upper bound on the number of ulps error in the expression sqrt(1 - c) * sqrt(1 + c), over the domain -1 < c < 1, assuming IEEE 754 binary floating-point, round-ties-to-even rounding mode, and correctly-rounded operations throughout. I haven't done that, but I'd be very surprised if that upper bound turned out to be more than 10 ulps.

@chtz: Ah, good point. `sqrt((1 - c) * (1 + c))` ought to be even better, since `sqrt` is a contracting operation that tends to reduce relative error. I'll edit in a bit. — Mark Dickinson, Jan 13 '21 at 19:32
So if I am not mistaken, this operation is actually pretty stable. I was getting relative errors of 1e-3 using double precision, and `c` was consistently close to 1 in those cases. I quickly suspected the sqrt(1-c²), but there is probably another error source then! Thanks a lot for the answer, very clarifying! Did not know about Sterbenz' Lemma. — jeff, Jan 14 '21 at 11:26

njuffa · Answer 2 · 2021-01-14T02:46:45.790

Mark Dickinson provides a good answer for the general case, I will add to that with a somewhat more specialized approach.

Many computing environments these days provide an operation called a fused multiply-add, or FMA for short, which was specifically designed with situations like this in mind. In the computation of fma(a, b, c) the full product a * b (untruncated and unrounded) enters into the addition with c, then a single rounding is applied at the end.

Currently shipping GPUs and CPUs, including those based on the ARM64, x86-64, and Power architectures, typically include a fast hardware implementation of FMA, which is exposed in programming languages of the C and C++ families as well as many others as a standard math function fma(). Some -- usually older -- software environments use software emulation of FMA, and some of these emulations have found to be faulty. In addition, such emulations tend to be pretty slow.

Where FMA is available, the expression can be evaluated numerically stable and without risk of premature overflow and underflow as fabs (b * c) / sqrt (fma (c, -c, 1.0)), where fabs() is the absolute value operation for floating-point operands and sqrt() computes the square root. Some environments also offer a reciprocal square root operation, often called rsqrt(), in which case a potential alternative is to use fabs (b * c) * rsqrt (fma (c, -c, 1.0)). The use of rsqrt() avoids the relatively expensive division and is therefore typically faster. However, many implementations of rsqrt() are not correctly rounded like sqrt(), so accuracy may be somewhat worse.

A quick experiment with the code below seems to indicate that the maximum error of the FMA-based expression is about 3 ulps, as long as b is a normal floating-point number. I stress that this does not prove any error bound. The automated Herbie tool, which tries to find numerically advantageous rewrites of a given floating-point expression suggests to use fabs (b * c) * sqrt (1.0 / fma (c, -c, 1.0)). This seems to be a spurious result however, as I cannot neither think of any particular advantage nor find one experimentally.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <math.h>

#define USE_ORIGINAL  (0)
#define USE_HERBIE    (1)

/* function under test */
float func (float b, float c)
{
#if USE_HERBIE
     return fabsf (b * c) * sqrtf (1.0f / fmaf (c, -c, 1.0f));
#else USE_HERBIE
     return fabsf (b * c) / sqrtf (fmaf (c, -c, 1.0f));
#endif // USE_HERBIE
}

/* reference */
double funcd (double b, double c)
{
#if USE_ORIGINAL
    double b2 = b * b;
    double c2 = c * c;
    return sqrt ((b2 * c2) / (1.0 - c2));
#else
    return fabs (b * c) / sqrt (fma (c, -c, 1.0));
#endif
}

uint32_t float_as_uint32 (float a)
{
    uint32_t r;
    memcpy (&r, &a, sizeof r);
    return r;
}

float uint32_as_float (uint32_t a)
{
    float r;
    memcpy (&r, &a, sizeof r);
    return r;
}

uint64_t double_as_uint64 (double a)
{
    uint64_t r;
    memcpy (&r, &a, sizeof r);
    return r;
}

double floatUlpErr (float res, double ref)
{
    uint64_t i, j, err, refi;
    int expoRef;
    
    /* ulp error cannot be computed if either operand is NaN, infinity, zero */
    if (isnan (res) || isnan (ref) || isinf (res) || isinf (ref) ||
        (res == 0.0f) || (ref == 0.0f)) {
        return 0.0;
    }
    /* Convert the float result to an "extended float". This is like a float
       with 56 instead of 24 effective mantissa bits.
    */
    i = ((uint64_t)float_as_uint32(res)) << 32;
    /* Convert the double reference to an "extended float". If the reference is
       >= 2^129, we need to clamp to the maximum "extended float". If reference
       is < 2^-126, we need to denormalize because of the float types's limited
       exponent range.
    */
    refi = double_as_uint64(ref);
    expoRef = (int)(((refi >> 52) & 0x7ff) - 1023);
    if (expoRef >= 129) {
        j = 0x7fffffffffffffffULL;
    } else if (expoRef < -126) {
        j = ((refi << 11) | 0x8000000000000000ULL) >> 8;
        j = j >> (-(expoRef + 126));
    } else {
        j = ((refi << 11) & 0x7fffffffffffffffULL) >> 8;
        j = j | ((uint64_t)(expoRef + 127) << 55);
    }
    j = j | (refi & 0x8000000000000000ULL);
    err = (i < j) ? (j - i) : (i - j);
    return err / 4294967296.0;
}

// Fixes via: Greg Rose, KISS: A Bit Too Simple. http://eprint.iacr.org/2011/007
static unsigned int z=362436069,w=521288629,jsr=362436069,jcong=123456789;
#define znew (z=36969*(z&0xffff)+(z>>16))
#define wnew (w=18000*(w&0xffff)+(w>>16))
#define MWC  ((znew<<16)+wnew)
#define SHR3 (jsr^=(jsr<<13),jsr^=(jsr>>17),jsr^=(jsr<<5)) /* 2^32-1 */
#define CONG (jcong=69069*jcong+13579)                     /* 2^32 */
#define KISS ((MWC^CONG)+SHR3)

#define N  (20)

int main (void)
{
    float b, c, errloc_b, errloc_c, res;
    double ref, err, maxerr = 0;
    
    c = -1.0f;
    while (c <= 1.0f) {
        /* try N random values of `b` per every value of `c` */
        for (int i = 0; i < N; i++) {
            /* allow only normals */
            do {
                b = uint32_as_float (KISS);
            } while (!isnormal (b));
            res = func (b, c);
            ref = funcd ((double)b, (double)c);
            err = floatUlpErr (res, ref);
            if (err > maxerr) {
                maxerr = err;
                errloc_b = b;
                errloc_c = c;
            }
        }
        c = nextafterf (c, INFINITY);
    }
#if USE_HERBIE
    printf ("HERBIE max ulp err = %.5f @ (b=% 15.8e c=% 15.8e)\n", maxerr, errloc_b, errloc_c);
#else // USE_HERBIE
    printf ("SIMPLE max ulp err = %.5f @ (b=% 15.8e c=% 15.8e)\n", maxerr, errloc_b, errloc_c);
#endif // USE_HERBIE
    
    return EXIT_SUCCESS;
}

_"This seems to be a spurious result however, as I cannot neither think of any particular advantage nor find one experimentally."_ Herbie probably simply doesn't support `rsqrt`. — orlp, Jan 14 '21 at 10:01
@orlp My assessment was made while taking into account that Herbie doesn't know anything about `rsqrt`. If you know of a reason why `fabs (b * c) * sqrt (1.0 / fma (c, -c, 1.0));` as suggested by Herbie would be superior to `fabs (b * c) / sqrt (fma (c, -c, 1.0))` I'd be happy to follow up on my side and update my answer accordingly. — njuffa, Jan 14 '21 at 10:05
Thanks for this insightful answer! I had read about fma before, but have never used it in practice. Never realized it was applicable in this case. — jeff, Jan 14 '21 at 11:38
I think it's hard to overstate how generally useful `fma` is for accurate computation. The most obvious advantage is that in the typical case it only rounds once rather than twice, but the bigger impact in my opinion is that the general cause of bad floating point accuracy is the combination of 2 operations, one of which affects relative error, and the next which turns that into an absolute error. `fma` deals with a large class of these problems. — Oscar Smith, Jan 14 '21 at 16:30

Numerically stable way to compute sqrt((b²*c²) / (1-c²)) for c in [-1, 1]

2 Answers2