4

I am writing a library for multiprecision arithmetic based on a paper I am reading. It is very important that I am able to guarantee the properties of floating point numbers I use. In particular, that they adhere to the IEEE 754 standard for double precision floating point numbers. Clearly I cannot guarantee the behavior of my code on an unexpected platform, but for x86 and x64 chipsets, which I am writing for, I am concerned about a particular hazard. Apparently, some or all x86 / x64 chipsets may make use of extended precision floating point numbers in their FPU registers, with 80 bits of precision. I cannot tolerate my arithmetic being handled in extended precision FPUs without being rounded to double precision after every operation because the proofs of correctness for the algorithms I am using rely on rounding to occur. I can easily identify cases in which extended precision could break these algorithms.

I am writing my code in C#. How can I guarantee certain values are rounded? In C, I would declare variables as volatile, forcing them to be written back to RAM. This is slow and I'd rather keep the numbers in registers as 64 bit floats, but correctness in these algorithms is the whole point, not speed. In any case, I need a solution for C#. If this seems in-feasible I will approach the problem in a different language.

implmentor
  • 1,386
  • 4
  • 21
  • 32
Void Star
  • 2,401
  • 4
  • 32
  • 57
  • 1
    As workaround for lack of `volatile double` you may convert operands to/from bytes after each operation... Would be really sloooooowwww.. – Alexei Levenkov Aug 20 '15 at 02:55

2 Answers2

2

The C# spec has this to say on the topic:

Only at excessive cost in performance can such hardware architectures be made to perform floating-point operations with less precision, and rather than require an implementation to forfeit both performance and precision, C# allows a higher precision type to be used for all floating-point operations. Other than delivering more precise results, this rarely has any measurable effects.

As a result, third-party libraries are required to simulate the behavior of a IEEE 754-compliant FPU. One such is SoftFloat, which creates a type SoftFloat that uses operator overloads to simulate a standard double behavior.

David Pfeffer
  • 38,869
  • 30
  • 127
  • 202
1

An obvious problem with 80-bit intermediate values is that it is very much up to the compiler and optimizer to decide when a value is truncated back to 64-bit. So different compilers may end up producing different results for the same sequence of floating point operations. An example is an operation like abcd. Depending on the availability of 80-bit floating point registers the compiler might round ab to 64-bit and leave c*d at 80-bit. I guess this is the root of your question where you need to eliminate this uncertainty.

I think your options are pretty limited in managed code. You could use a 3rd party software emulation like the other answer suggested. Or maybe you could try coercing the double to long and back. I have no way of checking if this actually works right now but you could try something like this between operations:

public static double Truncate64(double val)
{
    unsafe
    {
        long l = *((long*) &val);
        return *((double*) &l);
    }
}

This also type checks:

public static double Truncate64(double val)
{
    unsafe
    {
        return *((long*) &val);
    }
}

Hope that helps.

jaket
  • 9,140
  • 2
  • 25
  • 44