One major thing to watch out for is that the C language originally specified that a computation like
float a=b+c+d;
would convert b, c, and d to the longest available floating-point type (which happened to be type double
), add them together, and then convert the result to float
. Such semantics were simple for the compiler and helpful for the programmer, but had a slight difficulty: the most efficient format for storing numbers isn't the same as the most efficient format for performing computations. On machines without floating-point hardware, it's faster to perform computations on a value stored as a not-necessarily-normalized 64-bit mantissa and a separately-stored 15-bit exponent and sign, then to operate on values stored as a 64-bit double
which must be unpacked before every operation and then normalized and repacked after (even if only to be immediately unpacked for the next operation). Having machines keep intermediate results in the longer format improved both speed and accuracy; ANSI C allowed for this with type long double
.
Unfortunately, ANSI C failed to provide a means by which variable-argument functions could indicate whether they wanted all floating-point values to be converted to long double
, all converted to double
, or have float
and double
passed as double
and long double
as long double
. Had such a facility existed, it would have been easy to make code which wouldn't have to distinguish between double
and long double
values. Unfortunately, the lack of such a feature means that on systems where double
and long double
are different types code does have to care about the distinction, and on systems where they aren't it doesn't. This in turn means that a lot of code written on systems where the types are the same will break on systems where they aren't; compiler vendors decided the easiest fix was to simply make long double
be synonymous with double
and not providing any type that could hold intermediate computations accurately.
Since having intermediate computations performed in an unrepresentable type is bad, some people decided the logical thing was to have computations on float
be performed as type float
. While there are some hardware platforms where this may be faster than using type double
, it often has undesirable consequences for accuracy. Consider:
float triangleArea(float a, float b, float c)
{
long double s = (a+b+c)/2.0;
return sqrt((s-a)*(s-b)*(s-c)*c);
}
On systems where intermediate computations are performed using long double
, this will yield good accuracy. On systems where intermediate computations are performed as float
, this may yield horrible accuracy even when a, b, and c are all precisely representable. For example, if a and b are 16777215.0f and c is 4.0f, the value of s
should be 16777217.0, but if the sum of a, b, and c is computed as float
, it will be 1677216.0; this will yield an area which is less than half correct value. If a and c were 16777215.0f and b was 4.0f (same numbers; different order) then s
would get computed as 16777218.0, yielding an area which is 50% too big.
If you have calculations which yield good results on x86 (many compilers for which eagerly promote to an 80-bit type even though they unhelpfully make it unavailable to the programmer) but lousy results on x64, I would guess you may have a calculation like the above which needs to have intermediate steps performed at higher precision than the operands or final result. Changing the first line of the above method to:
long double s = ((long double)a+b+c)/2.0;
will force the intermediate computations to be done in higher-precision, rather than performing the computations at low-precision and then storing the inaccurate result into a higher-precision variable.