Inconsistent behaviour in identical code

Question

An error trap trips after running a physics simulation for about 20 minutes. Realising this would be a pain to debug, I duplicated the relevant subroutine in a new project, and called it with hard-coded copies of the original input data at the moment the error occurred. But the error trap did not trip! After two days of tedious work to isolate the exact point where the behaviours of the two instances of the subroutine diverge, I’ve traced the problem to a very simple function for computing a Cross_Product.

This Cross_Product function is identical in both programs. I have even checked the disassembly and made sure that the compiler is producing identical code. The function is also receiving identical input data in both cases. I have even explicitly checked the rounding mode inside the functions and they are identical. Yet, they are returning slightly different results. Specifically, the LSB is different for two out of three of the returned vector components. Even the debugger itself confirms that these two out of three variables are not equal to the expressions they’ve been explicitly assigned. (See screenshot below.)

In the original program, the debugger shows “true” in all of the last three lines of the watch list instead of only the last one.

I’m using Code::Blocks 13.12 with the GCC Compiler on XP, with an AMD Athlon 64 CPU. However, I recompiled and ran the test program in Code::Blocks 16.01 on a much more modern Windows 10 machine, with an Intel Core i5 CPU, and the results were identical.

Here is my minimal, complete, and verifiable code to reproduce the bizarre result which disagrees with my original program AND with the debugger itself (unfortunately, I can’t include the original physics program because it’s HUGE):

extern "C" {
    __declspec(dllimport) int __stdcall IsDebuggerPresent(void);
    __declspec(dllimport) void __stdcall DebugBreak(void);
}

struct POLY_Triplet {
   double XYZ[3];
};

POLY_Triplet Cross_Product(POLY_Triplet Vector1, POLY_Triplet Vector2) {
   POLY_Triplet Result;

   Result.XYZ[0] = Vector1.XYZ[1] * Vector2.XYZ[2] - Vector1.XYZ[2] * Vector2.XYZ[1];
   Result.XYZ[1] = Vector1.XYZ[2] * Vector2.XYZ[0] - Vector1.XYZ[0] * Vector2.XYZ[2];
   Result.XYZ[2] = Vector1.XYZ[0] * Vector2.XYZ[1] - Vector1.XYZ[1] * Vector2.XYZ[0];

   return Result;
}

int main() {
   POLY_Triplet Triplet1;

   POLY_Triplet Collision_Axis_Vector;

   POLY_Triplet Boundary_normal;

   *(long long int *)(&Collision_Axis_Vector.XYZ[0]) = 4594681439063077250;
   *(long long int *)(&Collision_Axis_Vector.XYZ[1]) = 4603161398996347097;
   *(long long int *)(&Collision_Axis_Vector.XYZ[2]) = 4605548671330989714;

   *(long long int *)(&Triplet1.XYZ[0]) = -4626277815076045984;
   *(long long int *)(&Triplet1.XYZ[1]) = -4637257536736295424;
   *(long long int *)(&Triplet1.XYZ[2]) = 4589609575355367200;

   if (IsDebuggerPresent()) {
      DebugBreak();
   }

   Boundary_normal = Cross_Product(Collision_Axis_Vector, Triplet1);

   return 0;
}

For convenience, here are the relevant lines for the watch list, as seen in the screenshot:

(Result.XYZ[0] == Vector1.XYZ[1] * Vector2.XYZ[2] - Vector1.XYZ[2] * Vector2.XYZ[1])
(Result.XYZ[1] == Vector1.XYZ[2] * Vector2.XYZ[0] - Vector1.XYZ[0] * Vector2.XYZ[2])
(Result.XYZ[2] == Vector1.XYZ[0] * Vector2.XYZ[1] - Vector1.XYZ[1] * Vector2.XYZ[0])

Can anyone please explain this behaviour?

Undefined behavior, somewhere in your code, might cause differences in the generated result, even if assemblies are identical. — Algirdas Preidžius, Sep 03 '17 at 22:36
what is your gcc version? And what c++ standard can you use? — bolov, Sep 03 '17 at 22:50
https://stackoverflow.com/questions/46028570/assigning-an-exact-representation-to-a-floating-point-value — bolov, Sep 03 '17 at 23:10

score 3 · Answer 1 · answered Sep 03 '17 at 22:36

3

*(long long int *)(&Collision_Axis_Vector.XYZ[0]) = 4594681439063077250;

and all similar lines introduce Undefined Behavior in the program because they violate the Strict Aliasing rule:

you access a double value as long long int

answered Sep 03 '17 at 22:36

bolov

72,283
15
145
224

You should also demonstrate how he could accomplish what he wants by casting to a `char` array or using a union instead. – Sep 03 '17 at 22:40
@Frank I don't know how to write to a double without violating the standard other than assigning a double value to it. – bolov Sep 03 '17 at 22:50
https://stackoverflow.com/questions/46028570/assigning-an-exact-representation-to-a-floating-point-value – bolov Sep 03 '17 at 23:10
1

@bolov Thank you for your suggestion. As pointed out via your own link on the Strict Aliasing rule, using an `unsigned char *` instead and writing each byte individually is legal, and should solve the Strict Aliasing problem. However, I have tried this, and it has made no difference whatsoever. I would have been surprised if it had, since the Strict Aliasing issue could hardly explain why only a few of the least significant bits are wrong. – Entropy Sep 04 '17 at 01:30
Use `memcpy` to avoid a strict aliasing violation – M.M Sep 04 '17 at 03:59

Soonts · Answer 2 · 2017-09-04T00:23:35.407

Compiled your sample with Visual C++. I can confirm the output is slightly different from what you see in the debugger, here’s mine:

CAV: 4594681439063077250, 4603161398996347097, 4605548671330989714
T1: -4626277815076045984, -4637257536736295424, 4589609575355367200
CP: 4589838838395290724, -4627337114727508684, 4592984408164162561

I don’t know for sure what might cause the difference, but here’s an idea.

Since you’ve already looked at the machine code, what are you compiling into, legacy x87 or SSE? I presume it’s SSE, most compilers target this by default, for years already. If you pass -march native to gcc, very likely your CPU has some FMA instruction set (AMD since late 2011, Intel since 2013). Therefore, your GCC compiler used these _mm_fmadd_pd / _mm_fmsub_pd intrinsics, causing your 1-bit difference.

However, that’s all theory. My advice is, instead of trying to find out what caused that difference, you should fix your outer code.

It’s a bad idea to trap to debugger as the result of condition like this.

The numerical difference is very small. That’s the least significant bit in a 52-bit mantissa, i.e. the error is just 2^(-52).

Even if you’ll find out what caused that, disable e.g. FMA or some other thing that caused the issue, doing that is fragile, i.e. you’re going to support that hack through the lifetime of the project. You’ll upgrade your compiler, or the compiler will decide to optimize your code differently, or even you’ll upgrade the CPU — your code may break in the similar way.

Better approach, just stop comparing floating point numbers for exact equality. Instead, calculate e.g. absolute difference, and compare that with a small enough constant.

Your results are identical to the results I get in the original physics program. The Athlon 64 has been around since 2003. I can’t tell you whether it supports FMA instructions or not, but I can tell you from looking at the disassembly that such instructions are not being used. And even if they were, the results should still be the same in both cases, since I already said the disassembly listings are identical! — Entropy, Sep 04 '17 at 02:20
As for your advice, you misunderstand the situation. I’m not comparing for exact equality. The error trap is something else. The 1-bit difference matters in this situation because I’m trying to recreate conditions which occur after 20 minutes of number crunching to diagnose the error that’s being trapped, and this 1-bit difference just happens to represent an edge-case which alters the flow of the program and ruins the experiment. — Entropy, Sep 04 '17 at 02:20
@Entropy What exactly do you see in your disassembly? x87 instructions like fmul, fsub, or SSE2 instructions like mulsd, subsd? — Soonts, Sep 04 '17 at 02:34
I see the following instructions: `fldl`, `fmulp`, `fsubrp`, `mov`, `fstpl`. — Entropy, Sep 04 '17 at 02:40

Soonts · Accepted Answer · 2017-09-04T08:50:15.943

2

I can confirm the problematic output you’re getting can be caused by a change in x87 precision. The precision value is stored in x87 FPU control register, and when changed, the value persists through the lifetime of your thread, affecting all x87 code running on the thread.

Apparently, some other component of your huge program (or an external library you use) sometimes changes mantissa length from 53 bits (which is the default) to 64 bits (which means use the full precision of these 80 bit x87 registers).

The best way to fix, switch your compiler from x87 to SSE2 target. SSE always use either 32 or 64-bit floats (depending on the instructions used), it doesn’t have 80-bit registers at all. Even your 2003 Athlon 64 already supports that instruction set. As a side effect, your code will become somewhat faster.

Update: If you don’t want to switch to SSE2, you can reset the precision to whatever value you like. Here’s how to do that in Visual C++:

#include <float.h>
uint32_t prev;
_controlfp_s( &prev, _PC_53, _MCW_PC ); // or _PC_64 for 80-bit

For GCC, it’s something like this (untested)

#include <fpu_control.h>
#define _FPU_PRECISION ( _FPU_SINGLE | _FPU_DOUBLE | _FPU_EXTENDED )
fpu_control_t prev, curr;
_FPU_GETCW( prev );
curr = ( prev & ~_FPU_PRECISION ) | _FPU_DOUBLE; // or _FPU_EXTENDED for 80 bit
_FPU_SETCW( curr );

edited Sep 04 '17 at 08:50

answered Sep 04 '17 at 03:42

Soonts

20,079
9
57
130

That certainly seems the most plausible explanation I’ve heard all night. Unfortunately, your suggested solution will not solve the problem in this case, since if you are right, the conditions I am trying to reproduce rely on the FPU being set to the higher precision, and I need to change the behaviour of the test program to reflect the conditions of the physics simulation, not the other way round. It would be most helpful if you could offer a means of accessing this FPU control register so that I can verify your explanation and also fix my test program. – Entropy Sep 04 '17 at 08:06
Updated. Don’t forget most modern compilers default to SSE2 for code like yours. Getting x87 instructions out of them requires extra project settings/compiler flags. – Soonts Sep 04 '17 at 08:52
I had to download five missing header files to make it compile, but your solution was spot on, thank you! This also implies my original program may not be doing arithmetic consistently, since I don’t know at what point the control register gets changed. I suppose that’s something else for me to investigate later. As for switching to SSE2, I like the idea of doing that for the extra performance, but I’m not completely sure how. I’ve never used the `-march` options, and I don’t feel confident about playing with them until I have a better understanding of what they do. – Entropy Sep 04 '17 at 20:58
About SSE, you can download my auto-generated CHM from there https://github.com/Const-me/IntelIntrinsics Expand Technology/SSE2, this should give you an idea what does these SSE2 instructions do. The intrinsics relevant to double math are the ones ending with `_pd` and `_sd` – Soonts Sep 05 '17 at 08:20
I’m afraid I have no idea what a CHM is, but it is the compiler flags, rather than the SSE2 instructions themselves, which I’d like to know more about before I’d feel comfortable playing with them. At first I thought those compiler options were just performance tweaks for different chips. Now I’m getting the impression that they refer to different instruction sets. That makes me worry about compatibility between different processors. Also, the description for the AMD Athlon64 option says nothing explicitly about SSE, unlike some of the other options. So I’m pretty confused here. – Entropy Sep 11 '17 at 23:46
I have isolated the cause of the discrepancy in the precision settings: At the entry points of both programs, the control register is set to `0x037F`, which means extended precision. However, in my physics simulator, I call `_beginthreadex()` to spawn a new thread to do the arithmetic. For some reason, `_beginthreadex()` initialises this new thread with an FPU control register set to `0x027F`, which means only double precision. I don’t know why this happens, although it’s not a problem for me now that I’m aware of it. – Entropy Sep 12 '17 at 00:08
Can’t help you with compiler flags, I mostly code C++ for MS compiler, and these flags are very compiler-specific. Athlon 64 supports SSE2. https://en.wikipedia.org/wiki/Athlon_64 If you wanna be 100% sure download https://www.cpuid.com/softwares/cpu-z.html run on that Athlon and look at the “Instructions” on the first tab. – Soonts Sep 12 '17 at 10:26

Inconsistent behaviour in identical code

3 Answers3

Linked