4

I have a very strange bug in my program. I was not able to isolate the error in a reproducible code but at a certain place in my code there is:

    double distance, criticalDistance;
    ...

    if (distance > criticalDistance)
    {
        std::cout << "first branch" << std::endl;
    }
    if (distance == criticalDistance)
    {
        std::cout << "second branch" << std::endl;
    }

In debug build everything is fine. Only one branch gets executed.

But in release build all hell breaks loose and sometimes both branches get executed.

This is very strange, since if I add the else conditional:

    if (distance > criticalDistance)
    {
        std::cout << "first branch" << std::endl;
    }
    else if (distance == criticalDistance)
    {
        std::cout << "second branch" << std::endl;
    }

This does not happen.

Please, what can be the cause of this? I am using gcc 4.8.1 on Ubuntu 13.10 on a 32 bit computer.

EDIT1:

I am using precompiler flags

  • -std=gnu++11
  • -gdwarf-3

EDIT2:

I do not think this is caused by a memory leak. I analyzed both release and debug builds with valgrind memory analyzer with tracking of unitialized memory and detection of self-modifiyng code and I found no errors.

EDIT3:

Changing the declaration to

volatile double distance, criticalDistance;

makes the problem go away. Does this confirm woolstar's answer? Is this a compiler bug?

EDIT4:

using the gcc option -ffloat-store also fixes the problem. If I understand this correctly this is caused by gcc.

Community
  • 1
  • 1
Martin Drozdik
  • 12,742
  • 22
  • 81
  • 146
  • 3
    Because of rounding errors, it is usually unwise to check for equality for doubles. Instead check the absolute value is within some tolerance. – Robert Jacobs Jan 09 '14 at 15:42
  • @RobertJacobs Thank you! I am aware of this. In this case however I want to compare for strict equality, since the numbers I am comparing were created from exactly the same computation. – Martin Drozdik Jan 09 '14 at 15:44
  • You don't say what options your release build uses, but does it include `-ffast-math` by any chance? – Mark B Jan 09 '14 at 15:46
  • 2
    Even though the computation is the same, don't compare for equality. There's no real precision guarantee for floats. There's a good reason why the "rule" is "Never compare for equality between floats" - the operation is more or less undefined and isn't reliable at all - so don't rely on it. I wrote something about this in this C# question - http://stackoverflow.com/questions/20222314/is-math-absx-double-epsilon-equivalent-to-math-absx-0d/20223271 – Luaan Jan 09 '14 at 15:56
  • For a simple example of why comparing floating points numbers is usually a bad idea, type the following in your browser url bar: javascript:alert(0.1+0.2 == 0.3); then type javascript:alert(0.1+0.2); to see what happened there. – Kevin Jan 09 '14 at 16:00
  • @Luaan: Actually it's because of this: http://stackoverflow.com/a/8044894/560648 – Lightness Races in Orbit Jan 09 '14 at 16:00
  • 5
    @Luaan huh? There is nothing undefined about floating point arithmetic, and it can be relied upon. Of course, if specific compiler flags are used (like `-ffast-math`), it becomes a lot less reliable, but if that is not specified, then yes, FP arithmetic is perfectly reliable. Saying otherwise is just spreading misinformation. – jalf Jan 09 '14 at 16:01
  • @jalf "There is nothing undefined about floating point arithmetic and it can be relied upon unless you're in a situation where it can't be relied upon." Thanks for the tautology :) In any case, I do agree that as long as everything is completely IEEE-754 compliant, doing the same compution twice using the same input values will not allow you to have `a > b && a == b`. Which suggests that some part of the software and/or hardware is not IEEE-754 compliant, for whatever reason. – Luaan Jan 09 '14 at 16:28
  • In the actual code, are the comparisons actually `x > y` and `x == y`, where `x` and `y` are simple identifiers and not larger expressions? – Eric Postpischil Jan 09 '14 at 16:40
  • 1
    I strongly suspect that Woolstar has the right answer. You can quibble over whether it's a compiler bug or not, but expecting precise FP behavior from compiled/optimized code is probably unrealistic. If you need precisely predictable behavior write it in assembler. – Hot Licks Jan 09 '14 at 18:04
  • 2
    @HotLicks Actually, the fun part is, assembler isn't necessarily enough - see the accepted answer regarding Intel FPUs. You can work around it (both in C++ and assembler), but you first have to know about the possibility. And having knowledge about every single part of every single possible configuration of hardware and software is unrealistic; that's why we even have all those abstractions like IEEE-754, which simplify your job (at the expense of HW designers and performance). As this example shows, even that is a leaky abstraction - though easy to work around as soon as you know about it. – Luaan Jan 10 '14 at 09:59
  • @Luaan - In assembler you can reliably achieve predictability by inserting the order to truncate the results to 64 bits after each step. In C++ you must do this by storing to a volatile between steps or some such, and you can't really rely on the optimizer not screwing you up even then. – Hot Licks Jan 10 '14 at 11:39
  • @HotLicks Yes, but then you're killing the whole optimization. Instead of one FPCMP, you're doing extra work. The "error" is in the CPU/FPU, not the C++ compiler. If you know about it and expect it, this is no problem (and you can force it to give you the results you want if you really need to). If you don't know about this "trivia", manually written assembly isn't going to save you. – Luaan Jan 10 '14 at 12:08
  • @Luaan - You can say that about just about everything having to do with floating point. – Hot Licks Jan 10 '14 at 12:16
  • @Luaan: The error in not in the CPU/FPU. The floating-point hardware does not alter precision unpredictably. You could easily make calculations such as the OP describes by performing them entirely in one precision; no extra steps would be needed. The Intel architecture provides ways to work in various floating-point formats, including writing them from registers to memory and reading from memory to registers. The vague notion that writing data from a register to memory changes precision comes from fuzzy ideas about how compilers manage floating-point data, not from how the hardware behaves. – Eric Postpischil Jan 10 '14 at 14:09

3 Answers3

14
if (distance > criticalDistance)
  // true
if (distance == criticalDistance)
  // also true

I have seen this behavior before in my own code. It is due to the mismatch between the standard 64 bit value stored in memory, and the 80 bit internal values that intel processors use for floating point calculation.

Basically, when truncated to 64 bits, your values are equal, but when tested at 80 bit values, one is slightly larger than the other. In DEBUG mode, the values are always stored to memory and then reloaded so they are always truncated. In optimized mode, the compiler reuses the value in the floating point register and it doesn't get truncated.

woolstar
  • 5,063
  • 20
  • 31
  • This could actually be it. I am using am Intel processor. – Martin Drozdik Jan 09 '14 at 16:07
  • 1
    Interesting. It would require the compiler to truncate the value in one comparison, but not in the other, though, which seems odd. But yes, if the code is more complex than what is shown here, that could very well be the cause. +1 – jalf Jan 09 '14 at 16:11
  • And it can be controlled, see http://msdn.microsoft.com/en-us/library/e7s85ffb.aspx or on gcc, `-ffast-math` – Ben Voigt Jan 09 '14 at 16:14
  • 2
    This should not happen. The C and C++ standards require that, when an assignment is performed, the value be converted to its actual nominal type; excess precision must be discarded. Thus, the uses of `distance` and `criticalDistance` should be exactly the same in the two expressions. Excess precision could explain the behavior if there were compound expressions involved, such as `distance > r*r` and `distance == r*r`, which would allow for `r*r` to be computed differently in the two instances. – Eric Postpischil Jan 09 '14 at 16:35
  • While the OP may have experienced something like this, because their actual code, not shown in the problem, does contain compound expressions, this answer should not be accepted until we have ascertained either that their C++ implementation is non-conforming or the original code does contain compound expressions. In the latter case, the answer should be modified accordingly. In the current form, the answer misleads readers toward believing that `a > b` and `a == b` can both be true in conforming implementations, which is not the case. – Eric Postpischil Jan 09 '14 at 16:42
  • @EricPostpischil Please, do you have an idea what else could it be? – Martin Drozdik Jan 09 '14 at 16:43
  • 3
    @EricPostpischil while the standards require proper behavior, I have actually traced down this problem exactly in the `gcc` compiler in my own code. So in **practise**, `gcc` does this, regardless of what is supposed to happen in **theory**. – woolstar Jan 09 '14 at 16:45
  • @woolstar Please, is there a way to diagnose this problem so that I can be sure? And how did you solve your problem? Are you saying it is actually a bug in gcc? – Martin Drozdik Jan 09 '14 at 17:00
  • @EricPostpischil According to the question I link to below, g++'s documentation states that “‘-fexcess-precision=standard’ is not implemented for languages other than C”. This is GCC talk for “the standard is not respected wrt excess precision”. The OP also documented an example as an answer to eir own question there. http://stackoverflow.com/questions/20869904/c-handling-of-excess-precision – Pascal Cuoq Jan 09 '14 at 17:33
  • @EricPostpischil In fact, the non-standard-conforming behavior can be obtained with gcc (the C compiler) if one specifies neither `-std=…` nor `-fexcess-precision=standard`. – Pascal Cuoq Jan 09 '14 at 17:38
  • @PascalCuoq: Okay, that should be incorporated into this answer. It should be made clear that this is a deficiency in GCC and is not generally applicable to C++ implementations. Additionally, a workaround is to write the critical parts in C and compile with `-fexcess-precision=standard`. – Eric Postpischil Jan 09 '14 at 17:49
  • @EricPostpischil The explanation seems to ly in `<` using 10 bytes FPU and `==` 8 bytes. However I cannot see people refraining from those 2 extra bytes precision for conformance reason. – Joop Eggen Jan 09 '14 at 17:54
  • 2
    @JoopEggen: We are aware that differences arise when different precisions are used. The issue lies in the fact that the C++ standard forbids excess precision being used in the case shown in the question, and that the GCC C++ implementation is violating the standard. – Eric Postpischil Jan 09 '14 at 17:56
2

Please, what can be the cause of this?

Undefined behavior, aka. bugs in your code.

There is no IEEE floating point value which exhibits this behavior. So what's happening is that you are doing something wrong, which violates an assumption made by your compiler.

When optimizing your code, the compiler assumes that your code can be described by the C++ standard. If you do anything that is left undefined by the C++ standard, then these assumptions are violated, resulting in "weird" execution. It could be something "simple" like an uninitialized variable or a buffer overrun resulting in parts of the stack or heap being overwritten with garbage data, or it could be something more subtle, where you rely on a specific ordering between two operations, which is not guaranteed by the standard.

That is probably why you were not able to reproduce the problem in a small test case (the smaller test code does not contain the erroneous code), or and why you only see the error in optimized builds.

Of course, it is also possible that you've stumbled across a compiler bug, but a bug in your code is quite a bit more likely. :)

And best of all, it means that we don't really have a chance to debug the problem from the code snippet you've shown. We can say "the code shouldn't behave like that", but that's about all.

jalf
  • 243,077
  • 51
  • 345
  • 550
  • 1
    True of IEEE, but how do we know the compiler is in IEEE-compliant mode? – Ben Voigt Jan 09 '14 at 16:15
  • 1
    “There is no IEEE floating point value which exhibits this behavior.” How nice life would be if it was so simple. Please see http://arxiv.org/abs/cs/0701192 , perhaps looking for the sections about “excess precision”. Monniaux's report is about C but actually, the C++ standard leaves the same doors open to be interpreted by compiler-makers as allowing this kind of behavior. – Pascal Cuoq Jan 09 '14 at 17:27
  • @PascalCuoq: you're right, I didn't take that into account. I was talking about operations on a single data type throughout. In this case, there are values for which both comparisons would *fail* (`NaN`), but none where both would succeed. but you're right, given that 32-bit x86 CPUs internally widens to 80 bits, the "excess precision" issue shouldn't be ignored. I stand corrected. :) – jalf Jan 10 '14 at 09:19
1

You are not initializing your doubles, are you sure that they always get a value?
I have found that uninitilized variables in debug is allways 0, but in release they can be pretty much anything.

Simon Karlsson
  • 4,090
  • 22
  • 39
  • I assume the initialization is somewhere in the "`...`", OP said this isn't the actual reproducible code. – interjay Jan 09 '14 at 16:00