Float operation difference in C vs C++

Question

Solution

Thanks to @Michael Veksler 's answer, I was put on the right track to search for a solution. @Christoph, in this post, suggests trying different compiler flags to set the precision of the floating-point operations.

For me, the -mpc32 flag solved the problem.

I have to translate C++ code to C code as the new target won't have a C++ compiler. I am running into a strange thing, where a mathematical equation gives different results when run in a C program compared to when run in a C++ program.

Equation:

float result = (float)(a+b-c+d+e);

The elements of the equation are all floats. I check the contents of the memory of each element by using

printf("a : 0x%02X%02X%02X%02X\n",((unsigned char*)&a)[0],((unsigned char*)&a)[1],((unsigned char*)&a)[2],((unsigned char*)&a)[3]);

Both in C and in C++, a b c d and e are equal, but the results are different.

Sample of a calculation in C:

a : 0x1D9969BB
b : 0x6CEDC83E
c : 0xAC89452F
d : 0xD2DC92B3
e : 0x4FE9F23C
result : 0xCD48D63E

And a sample in C++:

a : 0x1D9969BB
b : 0x6CEDC83E
c : 0xAC89452F
d : 0xD2DC92B3
e : 0x4FE9F23C
result : 0xCC48D63E

When I separate the equation in smaller parts, as in r = a + b then r = r - c and so on, the results are equal. I have a 64-bit Windows machine.

Can someone explain why this happens? I am sorry for this noob question, I am just starting out.

EDIT

I use the latest version of MinGW with options

-O0 -g3 -Wall -c -fmessage-length=0

EDIT 2

Sorry for the long time...

Here are the values corresponding to the above hex ones in C:

a : -0.003564424114301801
b : 0.392436385154724120
c : 0.000000000179659565
d : -0.000000068388217755
e : 0.029652265831828117
r : 0.418524175882339480

And here are for C++:

a : -0.003564424114301801
b : 0.392436385154724120
c : 0.000000000179659565
d : -0.000000068388217755
e : 0.029652265831828117
r : 0.418524146080017090

They are printed like printf("a : %.18f\n",a);

The values are not known at compile time, the equation is in a function called multiple times throughout the execution. The elements of the equation are computed inside the function.

Also, I observed a strange thing: I ran the exact equation in a new "pure" project (for both C and C++), i.e. only the main itself. The values of the elements are the same as the ones above (in float). The result is r : 0xD148D63E for both. The same as in @geza 's comment.

Are all your variables initially `float`? Can you please share a [MCVE]? — François Andrieux, Mar 14 '19 at 15:06
It may be because of the evaluation order since floating point math is not associative. — eozd, Mar 14 '19 at 15:07
In case like this, I would see how the complier generates the assembly. GCC has `-s` option for that. But in general, you should not be suprised if you have read Goldberg's [paper](https://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf) — adrtam, Mar 14 '19 at 15:07
Please create a [mcve] to show us. Including showing the initialization of all the variables you show and use. — Some programmer dude, Mar 14 '19 at 15:07
Also, if you print the values *as floating point* values, what does it say then? — Some programmer dude, Mar 14 '19 at 15:08
@Someprogrammerdude Should I provide a complete initialisation for each variable or is that enough. Sorry... — Eduard Palko Mate, Mar 14 '19 at 15:14
The edited question helps, but it's not a [MCVE]. We should be able to take the code from your question, paste it in a source file and reproduce your results. — François Andrieux, Mar 14 '19 at 15:14
Because floats are stored reverse on intel, the difference is in the lower bits of the float. Are both results obtained on the same platform and only a different compiler? You will need to analyse the assembler generated (use the debugger). Possibly there is a different order of evaluation, or a different way/order of converting floats to double to floats again. — Paul Ogilvie, Mar 14 '19 at 15:21
Don't ask if it is enough. Just provide everything you have including the initialization code. — Bumsik Kim, Mar 14 '19 at 15:25
This could be a good question but the OP is not providing any responses to the comments. Voting to close as unclear. — Bathsheba, Mar 14 '19 at 15:30
on my computer, the result has LSB of 0xD1 (not 0xCC/0xCD). There's something going on with your compilation/compiler/rounding mode. — geza, Mar 14 '19 at 15:39
Are the values known at compile time? Will the optimizer simply calculate the result and return that? My guess is the compiler optimizer calculates this at compile time and C and C++ use different order of operations while optimizing leading to different intermediate results. — Goswin von Brederlow, Mar 14 '19 at 15:44
Until a [mcve] is posted, we can offer fantasies, idle chat, and baseless speculations. — n. m. could be an AI, Mar 14 '19 at 17:48
`I have a 64-bit Windows machine` but you use mingw which is a 32-bit compiler. It's far worse and outdated than mingw64 which is a different compiler despite the name. Using `-mpc32` is a bad idea when you need `double`. By default double should be used because `float` has too few significant bits to be useful — phuclv, Mar 15 '19 at 14:07

Michael Veksler · Accepted Answer · 2019-03-16T07:15:59.223

Introduction: Given that the question is not detailed enough, I am left to speculate the infamous gcc's 323 bug. As the low bug-ID suggests, this bug has been there forever. The bug report existed since June 2000, currently has 94 (!) duplicates, and the last one reported only half a year ago (on 2018-08-28). The bug affects only 32 bit executable on Intel computers (like cygwin). I assume that OP's code uses x87 floating point instructions, which are the default for 32 bit executables, while SSE instructions are only optional. Since 64 bit executables are more prevalent than 32, and no longer depend on x87 instructions, this bug has zero chance of ever being fixed.

Bug description: The x87 architecture has 80 bit floating point registers. The float requires only 32 bits. The bug is that x87 floating point operations are always done with 80 bits accuracy (subject to hardware configuration flag). This extra accuracy makes precision very flaky, because it depends on when the registers are being spilled (written) to memory.

If a 80 bit register is spilled into a 32 bit variable in memory, then extra precision is lost. This is the correct behavior if this happened after each floating point operation (since float is supposed to be 32 bits). However, spilling to memory slows things down and no compiler writer wants the executable to run slow. So by default the values are not spilled to memory.

Now, sometimes the value is spilled to memory and sometimes it is not. It depends on optimization level, on compiler heuristics, and on other seemingly random factors. Even with -O0 there could be slightly different strategies for dealing with spilling the x87 registers to memory, resulting in slightly different results. The strategy of spilling is probably the difference between your C and C++ compilers that you experience.

Work around: For ways to handle this, please read c handling of excess precision. Try running your compiler with -fexcess-precision=standard and compare it with -fexcess-precision=fast. You can also try playing with -mfpmath=sse.

NOTE: According to the C++ standard this is not really a bug. However, it is a bug according to the documentation of GCC which claims to follow the IEEE-754 FP standard on Intel architectures (like it does on many other architectures). Obviously bug 323 violates the IEE-754 standard.

NOTE 2: On some optimization levels -fast-math is invoked, and all bets are off regarding to extra precision and evaluation order.

EDIT I have simulated the described behavior on an intel 64-bit system, and got the same results as the OP. Here is the code:

int main()
{
    float a = hex2float(0x1D9969BB);
    float b = hex2float(0x6CEDC83E);
    float c = hex2float(0xAC89452F);
    float d = hex2float(0xD2DC92B3);
    float e = hex2float(0x4FE9F23C);
    float result = (float)((double)a+b-c+d+e);
    print("result", result);
    result = flush(flush(flush(flush(a+b)-c)+d)+e);
    print("result2", result);
}

The implementations of the support functions:

float hex2float(uint32_t num)
{
    uint32_t rev = (num >> 24) | ((num >> 8) & 0xff00) | ((num << 8) & 0xff0000) | (num << 24);
    float f;
    memcpy(&f, &rev, 4);
    return f;
}
void print(const char* label, float val)
{
    printf("%10s (%13.10f) : 0x%02X%02X%02X%02X\n", label, val, ((unsigned char*)&val)[0],((unsigned char*)&val)[1],((unsigned char*)&val)[2],((unsigned char*)&val)[3]);
}
float flush(float x)
{
    volatile float buf = x;
    return buf;
}

After running this I have got exactly the same difference between the results:

  result ( 0.4185241461) : 0xCC48D63E
 result2 ( 0.4185241759) : 0xCD48D63E

For some reason this is different than the "pure" version described at the question. At one point I was also getting the same results as the "pure" version, but since then the question has changed. The original values in the original question were different. They were:

float a = hex2float(0x1D9969BB);
float b = hex2float(0x6CEDC83E);
float c = hex2float(0xD2DC92B3);
float d = hex2float(0xA61FD930);
float e = hex2float(0x4FE9F23C);

and with these values the resulting output is:

   result ( 0.4185242951) : 0xD148D63E
  result2 ( 0.4185242951) : 0xD148D63E

Thanks Michael ! Very useful info which would make sense in my case. However, shouldn't then the results be different as well for the "pure" version, in my last edit? — Eduard Palko Mate, Mar 15 '19 at 10:44
the solution is to use SSE instead of x87. It's faster and more consistent. There's no reason to use x87 except when you want to use the extended precision[ — phuclv, Mar 15 '19 at 13:59
@EduardPalkoMate it seems that your input data has changed between the builds. I have added a description to that in the question. — Michael Veksler, Mar 16 '19 at 07:16
@phuclv due to ABI compatibility, the FP values on 32 bit x86 systems are passed through x87 registers. Since the x87 registers are already used, then compilers tend to perform computations of x87 rather than SSE on these systems. The flag exists for that reason, since it breaks the ABI. — Michael Veksler, Mar 16 '19 at 07:21

score 5 · Answer 2 · answered Mar 14 '19 at 16:06

The C and C++ standards both permit floating-point expressions to be evaluated with more precision than the nominal type. Thus, a+b-c+d+e may be evaluated using double even though the types are float, and the compiler may optimize the expression in other ways. In particular, using exact mathematics is essentially using an infinite amount of precision, so the compiler is free to optimize or otherwise rearrange the expression based on mathematical properties rather than floating-point arithmetic properties.

It appears, for whatever reason, your compiler is choosing to use this liberty to evaluate the expression differently in different circumstances (which may be related to the language being compiled or due to other variations between your C and C++ code). One may be evaluating (((a+b)-c)+d)+e while the other does (((a+b)+d)+e)-c, or other variations.

In both languages, the compiler is required to “discard” the excess precision when a cast or assignment is performed. So you can compel a certain evaluation by inserting casts or assignments. Casts would make a mess of the expression, so assignments may be easier to read:

float t0 = a+b;
float t1 = t0-c;
float t2 = t1+d;
float result = t2+e;

Explicitly typing parantheses didn't have any effect. Separating the equation did indeed produce the same results, but still different from the results obtained in the "pure" version (see my last edit). — Eduard Palko Mate, Mar 15 '19 at 09:25

Float operation difference in C vs C++

2 Answers2