9

I am writing an (almost) IEEE 854 compliant floating point implementation in TeX (which only has support for 32-bit integers). This standard only specifies the result of +, -, *, /, comparison, remainder, and sqrt: for those operations, the result should be identical to rounding the exact result to a representable number (according to the rounding mode).

I seem to recall that IEEE specifies that transcendental functions (sin, exp...) should yield faithful results (in the default round-to-nearest mode, they should output one of the two representable numbers surrounding the exact result). Computing the sine of small numbers is rather straightforward: shift by a multiple of 2*pi to obtain a number in the range [0,2*pi], then do some more work to reduce the range to [0,pi/4], and use a Taylor series.

Now assume that I want to compute sin(1e300). For that I would need to find 1e300 modulo 2*pi. That requires to know 300 (316?) decimals of pi, because with only 16 decimals, the result would have no significance whatsoever (in particular, it souldn't be faithful).

Is there a standard on what the result of sin(1e300) and similar very large numbers should be?

What do other floating point implementations do?

Bruno Le Floch
  • 244
  • 4
  • 12

2 Answers2

11

There is no standard that requires faithful rounding of transcendental functions. IEEE-754 (2008) recommends, but does not require, that these functions be correctly rounded.

Most good math libraries strive to deliver faithfully rounded results over the entire range (yes, even for huge inputs to sin( ) and similarly hard cases). As you note, this requires that the library know somewhat more digits of π then there are digits in the largest representable number. This is called an "infinite-pi" argument reduction.

To the point that @spraff raises, good math libraries adopt the viewpoint that the inputs are infinitely precise (i.e., the function should behave as though the input is always represented accurately). One can debate whether or not this is a reasonable position, but thats the working assumption for essentially all good math libraries.

All that said, there are plenty of libraries that take the easy route and use a "finite-pi" reduction, which basically treats a function like sin( ) as though π were a representable finite number. It turns out that this doesn't really cause any trouble for most uses, and is certainly easier to implement.

Stephen Canon
  • 103,815
  • 19
  • 183
  • 269
  • 2
    "infinite pi argument reduction" helps when you have large arguments but not when you have *very* large arguments. It's fine for a library to assume its inputs are accurate but when input+epsilon==input it doesn't matter any more. – spraff Jul 12 '11 at 14:26
  • 2
    @spraff: It doesn't matter *for the purposes of backwards error analysis*. It does matter when some other forms of error analysis are used, and that's the standard to which the best math libraries hold themselves. See, for example, K.C. Ng's "Good to the Last Bit" for discussion. – Stephen Canon Jul 12 '11 at 14:27
  • On your third paragraph, you could note that good libraries do this because IEEE-754 (2008) considers floating point numbers to be infinitely precise. Could you please give examples of "good math libraries" versus those "that take the easy route"? I think that it would add to your answer. One reason why the easy route does not cause much trouble is that it doesn't violate trigonometric identities such as sin(2x)=2*sin(x)*cos(x). – Bruno Le Floch Jul 12 '11 at 14:32
  • @spraff: In my case it may also matter because at the end of the day I need to implement it also in Lua, and the two results must be identical to the last bit. – Bruno Le Floch Jul 12 '11 at 14:34
  • @Bruno: exactly correct as to why the easy route doesn't usually cause problems. Off the top of my head, I know that the system math library on the following platforms uses an infinite-pi reduction: IBM z/OS, HP-UX, Sun, Apple iOS. On x86, the math library included with ICC uses such a reduction; I believe that glibc has switched to using an infinite-pi reduction as well, but I'm less certain of that. – Stephen Canon Jul 12 '11 at 14:37
  • 1
    @Bruno: If you just need two implementations to match, you can always use the same finite pi approximation for both implementations. That should suffice as long as you are careful to do your arithmetic the same way in both. – Stephen Canon Jul 12 '11 at 14:41
  • To add to Stephen's list: CUDA's math library also uses an infinite-pi reduction. – njuffa Jul 13 '11 at 04:20
  • @njuffa: good to see you posting, and thanks for the addition. – Stephen Canon Jul 13 '11 at 13:23
  • 5
    @BrunoLeFloch The library of .NET is hardly "good". I just realized that when the argument exceeds `Pow(2, 63)`, the [`Sin`](http://msdn.microsoft.com/en-us/library/system.math.sin.aspx) method returns its argument unchanged! So while `Sin(9.223372036854775E+18) == 0.98734189131435146` is probably not precise, `Sin(9.223372036854776E+18)` is approx `9.22E+18` ... – Jeppe Stig Nielsen Oct 20 '13 at 08:11
0

If you're doing operations on such large numbers, of course you're going to run out of precision:

#include <iostream>
#include <math.h>

int main () {
    long double i = 1;
    std :: cout << sin (i) << "\n" << sin (i+0.1) << "\n";
    i = pow (10, 300);
    std :: cout << sin (i) << "\n" << sin (i+0.1);
}

Output:

0.841471

0.891207

-0.817882

-0.81788

If you can't represent the inputs accurately, you can't represent the outputs accurately. Subtracting pi*pow(10,int(log_10(n/pi)) or whatever is going to make things worse for "small" n but when n gets suitably large, you're just adding noise to noise and it doesn't matter any more.

phuclv
  • 37,963
  • 15
  • 156
  • 475
spraff
  • 32,570
  • 22
  • 121
  • 229
  • 3
    Or to put it simpler, `1e300 + 2pi` = `1e300 - 2pi` = `1e300`, so the possible exact value represented by `1e300` could have any value as its `sin`? – Damien_The_Unbeliever Jul 12 '11 at 14:22
  • 3
    @Damien_The_Unbeliever When `1e300` is rounded to the nearest representable `double` value, that rounding is over an interval of length `1e284` approximately. So when we round `1e300` to a representable `double`, we "pass through" extremely many `2pi`-length periods. So it is rather meaningless to take `sin`. Still, mathematically, _after_ we have rounded the argument to a representable value, it is possible to calculate `sin` of that exact argument. To do so requires a lot of precision of your representation of `2pi`. As I said in a comment to the other answer, in .NET, `Sin(1e300) == 1e300`. – Jeppe Stig Nielsen Oct 20 '13 at 08:29