I have come across a function which computes atan(x)
(the source is here). Reducing it to the core of my question and slightly reformatting it, they have something like that:
static const double one = 1.0,
huge = 1.0e300;
double atan(double x)
{
/* A lot of uninteresting stuff here */
if (ix < 0x3fdc0000) { /* |x| < 0.4375 */
if (ix < 0x3e200000) { /* |x| < 2^-29 */
if ((huge + x) > one) return x; /* raise inexact */
}
id = -1;
}
/* A lot of more uninteresting stuff */
}
I would be very interested in what the line if ((huge + x) ...
is supposed to do and how it works.
According to the comments, if the absolute value of x
is smaller than 2^-29
, the expression or the comparison raises an inexact
error.
My first problem is that I currently don't get why this is done that way: If computing the arctan
using that function leads to imprecise results if the absolute value of x
is too small, why don't they just use something like if (fabs(x) < [some_value_here]) ...
? I suspect that this is just because the inexact
warning wouldn't be raised that way in their hardware / library, but I'd like to know for sure.
Assuming that I am right, my second problem is that I don't understand why the comparison is needed. I think that the key point here is that a very small number is added to a very large one, so that this addition does not change the large one sufficiently or even not at all. Hence, it is the addition which will raise the inexact
warning, not the comparison. So I am asking myself what the comparison should do. Is this just to force the compiler to actually compute (huge + x)
, which might be optimized out otherwise?
Finally, I would be grateful if somebody could explain the math a little bit. Choosing the value 1.0e300
for huge
seems a pretty arbitrary choice. But this is only a bonus question since I admit I haven't done the math part of my homework yet (I am not a total newbie regarding double
values and their IEEE754 representation, but understanding the mathematical aspects of this code will take me some time, unless somebody gives a short explanation).
EDIT 1
Just seen by accident:
The float32
version of that function, including the weird line discussed above, is nearly literally still in glibc 2.19! Since glibc is meant to be portable, that code should be as well. It is in the sub-directory sysdeps\ieee754\flt-32
, so I suppose this is the software emulation of the float32
functions, where portability is no issue because hardware-dependent oddities won't show up (I think the software emulation raises those exceptions exactly as defined in IEEE754).