2

I'm making a bit of a physics engine for fun. I'm trying to make it reliable even at low tickrates, so I'm doing a dangerous dance with float arithmetic and precision.

While debugging, I ended up running this code:

#define A 0.2063387632369995100000000000000f
#define B 0.7307806611061096200000000000000f


float a = A;
float b = B;

float floatie1 = A + (B * ((A)/(-B)));
float floatie2 = a + (b * ((a)/(-b)));
printf("%.40f\n", floatie1);
printf("%.40f\n", floatie2);

The output is:

0.0000000149011611938476560000000000000000
0.0000000000000000000000000000000000000000

And the bits of each(respectively):

00110010100000000000000000000000
00000000000000000000000000000000

I get that that expression is supposed to evaluate to 0.
I do not want it to evaluate to zero though, because I might have some other unrelated arithmetic somewhere in code that'll give me a value that's imprecise in the exact same way, and if I then subtract the 2 it'll be 0.
I don't want my expressions being simplified or optimized in arbitrary situations.

I've tried making the floats volatile and turning off optimizations to no avail.
I've also tried -f-associative-math, -ftrapping-math and -funsafe-math-optimizations, along with the combinations of their no- variants.

If I break up the expression, put stuff into different variables and do one thing at a time it works, but I don't wanna have to be looking behind my back every time I write a piece of code.

MSVC++ gives me the correct result with this code out of the box.

Version: gcc (MinGW.org GCC-6.3.0-1) 6.3.0 windows 8.1.
How do I turn this off?

TrisT
  • 639
  • 6
  • 19
  • Why don't you use a sane function for this, like `remainderf(A, B)`? – EOF May 15 '20 at 22:34
  • @EOF because that has nothing to do with what this expression (or the original expression from which this one is derived) is supposed to do. – TrisT May 15 '20 at 22:38
  • Hmm, it sure *looks* like that is what it is doing. I'm wondering whether the example you've chosen is representative of your actual problem then. – EOF May 15 '20 at 22:40
  • @EOF My problem is that the compiler is giving me a result that doesn't conform to IEEE float arithmetic (or the c++ standard according to https://stackoverflow.com/questions/34879992/does-gcc-optimize-c-code-algebraically-and-if-so-to-what-extent). And I realize it resembles a remainder, but there are 2 more variables involved on the original expression. This one is just meant to yield 0. – TrisT May 15 '20 at 22:42
  • If you exceed [max_digits10](https://en.cppreference.com/w/cpp/types/numeric_limits/max_digits10) the trailing digits will be suspect. – Eljay May 15 '20 at 22:44
  • @Eljay I can include the binary representation of each float in my post if you think it's appropriate, didn't feel the need to because the main point is that they should be the same. – TrisT May 15 '20 at 22:47
  • 2
    There are a few points to make here. First, are you compiling for 32-bit? If so, a good guess would be that the problem originates with the odious x87 FPU, which gets used in 32-bit mode by default, and which is not (sanely) ieee754-conforming. When compiling on `gcc` for x86_64 Linux, I cannot reproduce your problem. Your wonky, self-written `remainderf()` produces identical (wonky) results for compile-time and run-time computation, while library `remainderf()` produces correct results. – EOF May 15 '20 at 22:58
  • @EOF thank you, I'll do an objdump and check. I'm not familiar with the FPU instructions but it shouldn't be that big of a problem. With that said, I did break it up into different variables and it gave the correct result, I believe that might exclude the FPU being the problem. My hypothesis is that my wonky function only produces different results in compile-time vs run-time because the expression with variables is being simplified/optimized while the first is obligated to compute the actual value because I'm feeding it literals instead of variables. – TrisT May 15 '20 at 23:10
  • 2
    I'd be surprised if this *wasn't* another victim of x87 braindamage. At compile time, `gcc` honors ieee754, because why not, it doesn't cost anything then. At runtime, the x87 FPU requires a *horrific* performance cost for ieee754 conformance, because to *truly* compute at `float` precision, *every intermediate result must be stored to memory and subsequently reloaded to continue computation*. Madness, and the compiler typically only does it if it *has to*. You can force it by either storing every intermediate result to a variable (hint, hint), or passing `ffloat-store` to `gcc`. – EOF May 15 '20 at 23:16
  • @EOF you were completely right. I dumped the obj and it had a bunch of FPU instructions which I was surprised to see. Also -ffloat-store makes it work as intended. Unfortunately, I can't really use the remainder function because that's not what I'm doing in the original code. Wanna write an answer? Also, can you enable ffloat-store only for certain parts of the code? – TrisT May 15 '20 at 23:20
  • 2
    You can probably enable the `-ffloat-store`-deoptimization via either some `#pragma push #pragma GCC optimize (STRING, ...) [...] #pragma pop` or `__attribute__((optimize (STRING, ...)))` magic. – EOF May 15 '20 at 23:25
  • @EOF ``__attribute__((optimize ("-ffloat-store")))`` worked like a charm. Thank you for all this. Pretty sick that you know all this stuff. You gonna write an answer? – TrisT May 15 '20 at 23:35
  • No, I don't think I'll be writing an answer to this. If you want, you can answer it yourself. – EOF May 15 '20 at 23:36

1 Answers1

1

All of the credit goes to EOF.

I was asking the wrong question. It's not really GCC simplifying the expressions.

Apparently GCC has an optimization on by default for 32-bit that makes it use less x87 FPU instructions. Which despite being faster, don't really guarantee IEEE 754 compliance since under the hood doubles are used (EOF gives a more detailed explanation in his comment).

The reason why the results were different was because GCC used IEEE 754 compliant arithmetic to compute the value of the first expression at compile-time, while for the second expression (run-time), it used the non-compliant FPU instructions.

To avoid these instructions (and keep in mind it'll make the code slower), one can use -ffloat-store as a compile option.

Alternatively if you don't want your whole program to get slower, use either a pragma or a function attribute:

void __attribute__((optimize ("-ffloat-store"))) func(float a, float b){
    //your code
}

Note: instead of the x87 FPU, one can optionally attempt to target the SSE extensions, which are both IEEE 754 conformant and baseline for x86_64. More on it here.

TrisT
  • 639
  • 6
  • 19
  • 2
    `ffloat-store` is not an optimization. It severely de-optimizes code in exchange for ieee754 conformance on hardware conforming to a design that effectively predates ieee754. You might want to note that you can also get ieee754 conformance by targeting the newer (also, pretty old by now) vector/floating-point unit (SSE and later), which is always part of x86_64. The default compiler target for x86_64 is ieee754 conforming without kneecapping performance. – EOF May 16 '20 at 05:48
  • @EOF edited. Sorry, I had the right idea in my head but I was quite sleep deprived at that point. Added the note too. Thank you! – TrisT May 16 '20 at 18:10
  • `-ffloat-store` doesn't *avoid* using some "faster instructions", it instead uses the same instructions, but more of them (specifically more floating-point load/store instructions). – EOF May 16 '20 at 18:12
  • @EOF I'm confused then. If it uses the same instructions how come the result is different? – TrisT May 16 '20 at 18:15
  • 1
    The problem is the x87 FPU's design. The x87 FPU consists of a stack of 8 80-bit `extended double` floating point registers. There is no way to make the x87 FPU store 32-bit `float` variables (AFAIR, it always uses the extended exponent range of 16 bits, rather than the 8 / 11 bits of `float` or `double`). The *only* way to properly convert intermediate results to ieee754-conforming `float` is to store the value to memory (where it *is* ieee754 format), and reload the value, so it no longer has the extended precision/range. – EOF May 16 '20 at 18:27
  • @EOF godlike. Edited once again. I gave a layman's explanation and linked to your comment for the sake of not making the answer too cluttered. I've just looked for it in the wikipedia page and it did mention it there, so it was dumb of me to not read all of it. Thank you for guiding me through this. – TrisT May 16 '20 at 18:41