Implementation of sinpi() and cospi() using standard C math library

Question

The function sinpi(x) computes sin(πx), and the function cospi(x) computes cos(πx), where the multiplication with π is implicit inside the functions. These functions were initially introduced into the C standard math library as an extension by Sun Microsystems in the late 1980s. IEEE Std 754™-2008 specifies the equivalent functions sinPi and cosPi in section 9.

There are numerous computations where sin(πx) and cos(πx) occur naturally. A very simple example is the Box-Muller transform (G. E. P. Box and Mervin E. Muller, "A Note on the Generation of Random Normal Deviates". The Annals of Mathematical Statistics, Vol. 29, No. 2, pp. 610 - 611), which, given two independent random variables U₁ and U₂ with uniform distribution, produces independent random variables Z₁ and Z₂ with standard normal distribution:

Z₁ = √(-2 ln U₁) cos (2 π U₂)
Z₂ = √(-2 ln U₁) sin (2 π U₂)

A further example is the computation of sine and cosine for degree arguments, as in this computation of great-circle distance using the Haversine formula:

/* This function computes the great-circle distance of two points on earth 
   using the Haversine formula, assuming spherical shape of the planet. A 
   well-known numerical issue with the formula is reduced accuracy in the 
   case of near antipodal points.

   lat1, lon1  latitude and longitude of first point, in degrees [-90,+90]
   lat2, lon2  latitude and longitude of second point, in degrees [-180,+180]
   radius      radius of the earth in user-defined units, e.g. 6378.2 km or 
               3963.2 miles

   returns:    distance of the two points, in the same units as radius

   Reference: http://en.wikipedia.org/wiki/Great-circle_distance
*/
double haversine (double lat1, double lon1, double lat2, double lon2, double radius)
{
    double dlat, dlon, c1, c2, d1, d2, a, c, t;

    c1 = cospi (lat1 / 180.0);
    c2 = cospi (lat2 / 180.0);
    dlat = lat2 - lat1;
    dlon = lon2 - lon1;
    d1 = sinpi (dlat / 360.0);
    d2 = sinpi (dlon / 360.0);
    t = d2 * d2 * c1 * c2;
    a = d1 * d1 + t;
    c = 2.0 * asin (fmin (1.0, sqrt (a)));
    return radius * c;
}

For C++, the Boost library provides sin_pi and cos_pi, and some vendors offer sinpi and cospi functionality as extensions in system libraries. For example, Apple added __sinpi, __cospi and the corresponding single-precision versions __sinpif, __cospif to iOS 7 and OS X 10.9 (presentation, slide 101). But for many other platforms, there is no implementation readily accessible to C programs.

Compared with a traditional approach that uses e.g. sin (M_PI * x) and cos (M_PI * x), the use of sinpi and cospi improves accuracy by reducing rounding error via the internal multiplication with π, and also offers performance advantages due to the much simpler argument reduction.

How can one use the standard C math library to implement sinpi() and cospi() functionality in a reasonably efficient and standard compliant fashion?

For maximum accuracy and portability simultaneously, it seems to me that temporarily changing the rounding mode (using e.g. [`fenv()`](http://man7.org/linux/man-pages/man3/fenv.3.html) or `fesetround()`) to truncate/round-towards-zero is necessary. That way we can use e.g. Kahan sum/compensated sum, and split high-precision coefficients to several different limited-precision factors. Every other approach seems to rely on specific hardware (like `fma()`, for which emulation is horribly slow) or implementation details. — Nominal Animal, Mar 14 '17 at 19:00
@NominalAnimal I did not target maximum portability as this is not something I need. I pointed out various potential sticking points in my answer for people who want to address them in their own implementations. As for FMA, it is available as a hardware instruction on recent (roughly past 5 years) x86 and ARM processors , and of course on Power[PC] since the 1990s. If someone would like to provide an answer with code optimized for FMA-less hardware platforms, I would be happy to upvote it (and give an added bonus if it is really good). — njuffa, Mar 14 '17 at 19:25

njuffa · Accepted Answer · 2017-03-14T17:49:13.363

For simplicity, I will focus on sincospi(), which simultaneously provides both the sine and the cosine results. sinpi and cospi can then be constructed as wrapper functions that discard unneeded data. In many applications, the handling of floating-point flags (see fenv.h) is not required, nor do we need errno error reporting most of the time, so I will omit these.

The basic algorithmic structure is straightforward. As very large arguments are always even integers, and therefore thus multiples of 2π, their sine and cosine values are well-known. Other arguments are folded into range [-¼,+¼] while recording quadrant information. Polynomial minimax approximations are used to compute sine and cosine on the primary approximation interval. Finally, quadrant data is used to map the preliminary results to the final result by cyclical exchange of results and sign change.

The correct handling of special operands (in particular -0, infinities, and NaNs) requires the compiler to apply only optimizations that comply with IEEE-754 rules. It may not transform x*0.0 into 0.0 (this is not correct for -0, infinities, and NaNs) nor may it optimize 0.0-x into -x as negation is a bit-level operation according to section 5.5.1 of IEEE-754 (yielding different results for zeros and NaNs). Most compilers will offer a flag that enforces the use of "safe" transformations, e.g. -fp-model=precise for the Intel C/C++ compiler.

One additional caveat applies to the use of the nearbyint function during argument reduction. Like rint, this function is specified to round according to the current rounding mode. When fenv.h isn't used, the rounding mode defaults to round "to-nearest-or-even". When it is used, there is a risk that a directed rounding mode is in effect. This could be worked around by the use of round, which always provides the rounding mode "round to nearest, ties away from zero" independent of current rounding mode. However, this function will tend to be slower since it is not supported by an equivalent machine instruction on most processor architectures.

A note on performance: The C99 code below relies heavily on the use of fma(), which implements a fused multiply-add operation. On most modern hardware architectures, this is directly supported by a corresponding hardware instruction. Where this is not the case, the code may experience significant slow-down due to generally slow FMA emulation.

 #include <math.h>
 #include <stdint.h>

/* Writes result sine result sin(πa) to the location pointed to by sp
   Writes result cosine result cos(πa) to the location pointed to by cp

   In extensive testing, no errors > 0.97 ulp were found in either the sine
   or cosine results, suggesting the results returned are faithfully rounded.
*/
void my_sincospi (double a, double *sp, double *cp)
{
    double c, r, s, t, az;
    int64_t i;

    az = a * 0.0; // must be evaluated with IEEE-754 semantics
    /* for |a| >= 2**53, cospi(a) = 1.0, but cospi(Inf) = NaN */
    a = (fabs (a) < 9.0071992547409920e+15) ? a : az;  // 0x1.0p53
    /* reduce argument to primary approximation interval (-0.25, 0.25) */
    r = nearbyint (a + a); // must use IEEE-754 "to nearest" rounding
    i = (int64_t)r;
    t = fma (-0.5, r, a);
    /* compute core approximations */
    s = t * t;
    /* Approximate cos(pi*x) for x in [-0.25,0.25] */
    r =            -1.0369917389758117e-4;
    r = fma (r, s,  1.9294935641298806e-3);
    r = fma (r, s, -2.5806887942825395e-2);
    r = fma (r, s,  2.3533063028328211e-1);
    r = fma (r, s, -1.3352627688538006e+0);
    r = fma (r, s,  4.0587121264167623e+0);
    r = fma (r, s, -4.9348022005446790e+0);
    c = fma (r, s,  1.0000000000000000e+0);
    /* Approximate sin(pi*x) for x in [-0.25,0.25] */
    r =             4.6151442520157035e-4;
    r = fma (r, s, -7.3700183130883555e-3);
    r = fma (r, s,  8.2145868949323936e-2);
    r = fma (r, s, -5.9926452893214921e-1);
    r = fma (r, s,  2.5501640398732688e+0);
    r = fma (r, s, -5.1677127800499516e+0);
    s = s * t;
    r = r * s;
    s = fma (t, 3.1415926535897931e+0, r);
    /* map results according to quadrant */
    if (i & 2) {
        s = 0.0 - s; // must be evaluated with IEEE-754 semantics
        c = 0.0 - c; // must be evaluated with IEEE-754 semantics
    }
    if (i & 1) { 
        t = 0.0 - s; // must be evaluated with IEEE-754 semantics
        s = c;
        c = t;
    }
    /* IEEE-754: sinPi(+n) is +0 and sinPi(-n) is -0 for positive integers n */
    if (a == floor (a)) s = az;
    *sp = s;
    *cp = c;
}

The single-precision version differs basically only in the core approximations. Using exhaustive testing allows the precise determination of errors bounds.

#include <math.h>
#include <stdint.h>

/* Writes result sine result sin(πa) to the location pointed to by sp
   Writes result cosine result cos(πa) to the location pointed to by cp

   In exhaustive testing, the maximum error in sine results was 0.96677 ulp,
   the maximum error in cosine results was 0.96563 ulp, meaning results are
   faithfully rounded.
*/
void my_sincospif (float a, float *sp, float *cp)
{
    float az, t, c, r, s;
    int32_t i;

    az = a * 0.0f; // must be evaluated with IEEE-754 semantics
    /* for |a| > 2**24, cospi(a) = 1.0f, but cospi(Inf) = NaN */
    a = (fabsf (a) < 0x1.0p24f) ? a : az;
    r = nearbyintf (a + a); // must use IEEE-754 "to nearest" rounding
    i = (int32_t)r;
    t = fmaf (-0.5f, r, a);
    /* compute core approximations */
    s = t * t;
    /* Approximate cos(pi*x) for x in [-0.25,0.25] */
    r =              0x1.d9e000p-3f;
    r = fmaf (r, s, -0x1.55c400p+0f);
    r = fmaf (r, s,  0x1.03c1cep+2f);
    r = fmaf (r, s, -0x1.3bd3ccp+2f);
    c = fmaf (r, s,  0x1.000000p+0f);
    /* Approximate sin(pi*x) for x in [-0.25,0.25] */
    r =             -0x1.310000p-1f;
    r = fmaf (r, s,  0x1.46737ep+1f);
    r = fmaf (r, s, -0x1.4abbfep+2f);
    r = (t * s) * r;
    s = fmaf (t, 0x1.921fb6p+1f, r);
    if (i & 2) {
        s = 0.0f - s; // must be evaluated with IEEE-754 semantics
        c = 0.0f - c; // must be evaluated with IEEE-754 semantics
    }
    if (i & 1) {
        t = 0.0f - s; // must be evaluated with IEEE-754 semantics
        s = c;
        c = t;
    }
    /* IEEE-754: sinPi(+n) is +0 and sinPi(-n) is -0 for positive integers n */
    if (a == floorf (a)) s = az;
    *sp = s;
    *cp = c;
}

To the extent that you expressly depend on IEEE 754 semantics, how do you get around the fact that the C standard does not require implementations' floating-point representations or arithmetic to comply with IEEE 754 (at all)? — John Bollinger, Mar 14 '17 at 18:42
Similarly, with respect rounding, round to-nearest-or-even is the default rounding mode specified by *IEEE 754*, not by C. C does not even have among its standard predefined rounding-mode macros a distinction between the different round-to-nearest modes. — John Bollinger, Mar 14 '17 at 18:49
@JohnBollinger I don't. *If* a tool chain offers sufficient control of floating-point formats and transformations in accordance with IEEE-754 rules, *then* this code works correctly with respect to IEEE-754 (best I could test it). Conversely, if a tool chain generally does *not* conform to IEEE-754, there should be *no* expectation (nor do I see a necessity) for this code to comply with all requirements of IEEE-754 either. — njuffa, Mar 14 '17 at 18:55
@JohnBollinger `FE_TONEAREST` should be sufficient here; tie cases may round "to even" or "away from zero", unless I overlooked something. — njuffa, Mar 14 '17 at 19:01
Out of curiosity, why do you use hex floats and decimal doubles? — rici, Mar 14 '17 at 19:39
@rici No specific reason, other than that I pulled the double-precision code from an older C++ code base of mine, and C++ did not yet support hexfloats at the time (as far as I am aware, even C++11 doesn't support them yet). The single-precision version is more recent, from a C99 code base of mine. — njuffa, Mar 14 '17 at 19:42
In the last step of the calculation of the sine, instead of computing `s = s * t; r = r * s; s = fma (t, π, r);` which amounts to computing `s = π*t + t^3`, a multiplication by `t` can be factored out so that a `fma` and a further multiplication suffice: `s = fma (r, s, 3.1415926535897931e+0); s = s * t`. — Matías Giovannini, May 25 '18 at 15:19
@MatíasGiovannini This re-ordering causes maximum ulp error to increase (anecdotally to ~ 1.5 ulp), so the implementation is no longer faithfully rounded (which was a design goal of mine). This may be acceptable in some contexts. — njuffa, May 25 '18 at 15:57
`precise` might not be the correct floating point model for many compilers (unlike `strict` it will allow transformations that - according to some compiler-specific model - increase precision) — Joe, Apr 21 '20 at 11:04
You've clearly put a lot of good work into making an extremely high quality implementation here. But it seems to me that the suboptimal specification of this function places an unfortunate ceiling on its usefulness. Wouldn't it be more useful to implement `sincospi12(x)` = sine and cosine of x times pi/12? Then `sincospi` could be implemented straightforwardly in terms of `sincospi12`, but not vice-versa, it seems to me. And then we'd have exactly correct answers for, e.g. sin(pi/6) = 0.5. (Idea is from @phuclv in comment on https://stackoverflow.com/questions/16193783/#answer-16195094 ) — Don Hatch, May 13 '22 at 06:59
@DonHatch As someone who was responsible for standard math library design and implementation at some point in the past, I am always skeptical of adding to the pantheon of math functions without clear popular demand and clearly-defined benefits. FWIW, the post above was my contribution to π-Day 2017. — njuffa, May 13 '22 at 07:10
Hi @njuffa , happy belated pi-day! Your concern makes sense, of course, and in that spirit I'm trying to discern what the clearly-defined benefit of sincospi was. My understanding (and why it's valuable to me personally) is that, for example, it gives the exact answer for, e.g. sine and cosine of pi and of pi/2. That's nice, and valuable to me, but then I still have my next problem, which is that I don't have a function that gives the exact answer for, say, sin(pi/6). A sincospi12 function seems to me like it would solve this class of problem, completely and solidly. No? — Don Hatch, May 13 '22 at 07:39
(actually, it now occurs to me that sincospi3 would be just as good) — Don Hatch, May 13 '22 at 07:40
FWIW: I'm seeing a lower count of faithfully rounded results by changing the lead coefficient to a two-word value: (f32_sinpi_k7a) found here https://marc-b-reynolds.github.io/math/2020/03/11/SinCosPi.html — MB Reynolds, Nov 14 '22 at 01:08
@MBReynolds That is to be expected. In my experience of being responsible for a standard math library, people who compute in single precision crave performance first and foremost (with 3D graphics and signal processing important application areas). For these folks, accuracy beyond a faith-fully rounded implementation that costs performance is not attractive. Conversely, people who compute in double precision (many scientific computing tasks) usually desire accuracy close to 0.5 ulp; adding a few operations to improve to that level is often welcome. — njuffa, Nov 14 '22 at 01:39

Implementation of sinpi() and cospi() using standard C math library

1 Answers1

Linked