8

I am working with an array of doubles called indata (in the heap, allocated with malloc), and a local double called sum.

I wrote two different functions to compare values in indata, and obtained different results. Eventually I determined that the discrepancy was due to one function using an expression in a conditional test, and the other function using a local variable in the same conditional test. I expected these to be equivalent.

My function A uses:

    if (indata[i]+indata[j] > max) hi++;

and my function B uses:

    sum = indata[i]+indata[j];
    if (sum>max) hi++;

After going through the same data set and max, I end up with different values of hi depending on which function I use. I believe function B is correct, and function A is misleading. Similarly when I try the snippet below

    sum = indata[i]+indata[j];
    if ((indata[i]+indata[j]) != sum) etc.

that conditional will evaluate to true.

While I understand that floating point numbers do not necessarily provide an exact representation, why does that in-exact representation change when evaluated as an expression vs stored in a variable? Is recommended best practice to always evaluate a double expression like this prior to a conditional? Thanks!

  • It's basically because computers can't represent numbers with total precision. Read about floating point. – Iharob Al Asimi Jun 04 '16 at 05:26
  • @iharob He acknowledged that in his last paragraph. But it doesn't explain why it's different depending on whether you assign the result to a variable. – Barmar Jun 04 '16 at 05:28
  • Is this compiled for x86 or x86-64? – Dolda2000 Jun 04 '16 at 05:29
  • @iharob I think the main question is why it changes "when evaluated as an expression vs stored in a variable"? It shouldn't be different when stored in a variable (assuming the variable is off same type). OP obviously did read about floating point representations. – taskinoor Jun 04 '16 at 05:29
  • 2
    When the assignment takes place in B, the value is required to round to the nearest `double` value (typically 64 bits). In function A, the conditional expression *may* be evaluated using a higher precision (e.g. 80 bits). – user3386109 Jun 04 '16 at 05:29
  • 1
    @user3386109: If the code is indeed compiled for x86 rather than x86-64 so that it uses x87 rather than SSE floating-point instructions, then that is indeed the most likely explanation. – Dolda2000 Jun 04 '16 at 05:34
  • for me, the biggest stopper to start answerting the question, is that all variables are undeclared, so I can only guess if they are _all_ double – Павел Jun 04 '16 at 06:15
  • You could force the compiler to use SSE math (`-mfpmath=sse`) on 32-bit code if you know it is supported by the target operating system and the processor in question, or to remove the extra rounding – Antti Haapala -- Слава Україні Jun 04 '16 at 08:17
  • 2
    See http://www.exploringbinary.com/when-doubles-dont-behave-like-doubles/ – Rick Regan Jun 04 '16 at 12:51
  • Thanks for the link, which provided good information. Another good source for those interested is [link](https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html). The article includes considerable detail and also two good discussions about an example very similar to mine. – stilllearning Jun 05 '16 at 03:59

1 Answers1

11

I suspect you're using 32-bit x86, the only common architecture subject to excess precision. In C, expressions of type float and double are actually evaluated as float_t or double_t, whose relationships to float and double are reflected in the FLT_EVAL_METHOD macro. In the case of x86, both are defined as long double because the fpu is not actually capable of performing arithmetic at single or double precision. (It has mode bits intended to allow that, but the behavior is slightly wrong and thus can't be used.)

Assigning to an object of type float or double is one way to force rounding and get rid of the excess precision, but you can also just add a gratuitous cast to (double) if you prefer to leave it as an expression without assignments.

Note that forcing rounding to the desired precision is not equivalent to performing the arithmetic at the desired precision; instead of one rounding step (during the arithmetic) you now have two (during the arithmetic, and again to drop unwanted precision), and in cases where the first rounding gives you an exact-midpoint, the second rounding can go in the 'wrong' direction. This issue is generally called double rounding, and it makes excess precision significantly worse than nominal precision for certain types of calculations.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • Thanks for the explanation. I am running the code on an i7-3770 cpu running 64-bit Windows 7. However, my compiler is minGW, which is a 32 bit application. I will investigate compiler settings. I understand that there is a hardware level of precision higher than double, and will be more careful with using expressions vs variables. FYI, type casting the expression actually doesn't work in this case - same behavior as without it. – stilllearning Jun 05 '16 at 03:34
  • 1
    Try the cast with `-std=c99` or `-fexcess-precision=standard`. In some non-standards-conforming modes GCC gets the behavior wrong. – R.. GitHub STOP HELPING ICE Jun 06 '16 at 02:55
  • Thank you - either of those compiler command line options fixes the problem, so that this code behaves correctly: `if ((double)(indata[i]+indata[j]) > max) hi++;` – stilllearning Jun 20 '16 at 21:21