Is this floating-point optimization allowed?

Question

I tried to check out where float loses the ability to exactly represent large integer numbers. So I wrote this little snippet:

int main() {
    for (int i=0; ; i++) {
        if ((float)i!=i) {
            return i;
        }
    }
}

This code seems to work with all compilers, except clang. Clang generates a simple infinite loop. Godbolt.

Is this allowed? If yes, is it a QoI issue?

`gcc` does the same infinite loops optimization if you compile with `-Ofast` instead, so it's an optimization `gcc` deems unsafe, but it can do it. — 12345ieee, Jul 12 '19 at 17:26
See [this question](https://stackoverflow.com/questions/12596695/why-does-a-float-variable-stop-incrementing-at-16777216-in-c) too. — Calvin Godfrey, Jul 12 '19 at 18:10
g++ also generates an infinite loop, but it doesn't optimize away the work from inside it. You can see it does `ucomiss xmm0,xmm0` to compare `(float)i` with itself. That was your first clue that your C++ source doesn't mean what you thought it did. Are you claiming you got this loop to print / return `16777216`? What compiler/version/options was that with? Because that would be a compiler bug. gcc correctly optimizes your code to `jnp` as the loop branch (https://godbolt.org/z/XJYWeu) : keep looping as long as the operands to `!=` weren't NaN. — Peter Cordes, Jul 13 '19 at 03:05
Specifically, it's the `-ffast-math` option that is implicitly enabled by `-Ofast` that allows GCC to apply unsafe floating-point optimizations and thus generate the same code as Clang. MSVC behaves exactly the same way: without `/fp:fast`, it generates a bunch of code that results in an infinite loop; with `/fp:fast`, it emits a single `jmp` instruction. I'm assuming that without explicitly turning on unsafe FP optimizations, these compilers get hung up on the IEEE 754 requirements regarding NaN values. Rather interesting that Clang doesn't, actually. Its static analyzer is better. @12345ieee — Cody Gray - on strike, Jul 13 '19 at 05:21
@geza: If the code did what you intended, checking for when the mathematical value of `(float) i` differed from the mathematical value of `i`, then the result (the value returned in the `return` statement) would be 16,777,217, not 16,777,216. — Eric Postpischil, Jul 13 '19 at 11:26
The arguments to != must be promoted to the same type. Not sure if (float)i and i may, may not, or must be both cast to double. — gnasher729, Jul 14 '19 at 18:26

score 64 · Answer 1 · answered Jul 12 '19 at 09:19

64

Note that the built-in operator != requires its operands to be of the same type, and will achieve that using promotions and conversions if necessary. In other words, your condition is equivalent to:

(float)i != (float)i

That should never fail, and so the code will eventually overflow i, giving your program Undefined Behaviour. Any behaviour is therefore possible.

To correctly check what you want to check, you should cast the result back to int:

if ((int)(float)i != i)

answered Jul 12 '19 at 09:19

Angew is no longer proud of SO

167,307
17
350
455

8

@Džuris It's UB. There _is_ no one definite result. The compiler might realize that it can only end in UB and decide to remove the loop entirely. – Nic Jul 12 '19 at 19:01
4

@opa do you mean `static_cast(static_cast(i))`? `reinterpret_cast` is obvious UB there – Caleth Jul 12 '19 at 22:33
6

@NicHartley: Are you saying `(int)(float)i != i` is UB? How do you conclude that? Yes it depends on *implementation defined* properties (because `float` isn't required to be IEEE754 binary32), but on any given implementation it's well-defined unless `float` can exactly represent all positive `int` values so we get signed-integer overflow UB. (https://en.cppreference.com/w/cpp/types/climits defines `FLT_RADIX` and `FLT_MANT_DIG` determine this). In general printing implementation-defined things, like `std::cout << sizeof(int)` is not UB... – Peter Cordes Jul 13 '19 at 02:53
2

@Caleth: `reinterpret_cast(float)` is not exactly UB, it's just a syntax error / ill-formed. It would be nice if that syntax allowed for type-punning of float to `int` as an alternative to `memcpy` (which is well-defined), but `reinterpret_cast<>` only works on pointer types, I think. – Peter Cordes Jul 13 '19 at 02:57
@PeterCordes No, looping forever until you overflow the integers is UB, so the compiler assumes it won't happen. Because the condition never triggers, the break never triggers, so it _would_ loop forever -- which would trigger UB, so the compiler might well just optimize out the entire loop. – Nic Jul 13 '19 at 03:21
@NicHartley: oh, so you're talking about the OP's original `(float)i != i` code. Yes that obviously hits UB because `x != x` is always false, even for NaN. But I think \@Džuris was asking about the result with the bug fix, and that's what \@Silvio Mayolo was replying about. (where it gives `16777217` on C++ implementations that use float = IEEE754 single-precision). My previous comment was about that fixed version, which would only be UB on some C++ implementations with large `float` (e.g. IEEE binary64) and / or narrow `int` (16-bit). – Peter Cordes Jul 13 '19 at 03:31
1

@PeterCordes Yep, I was talking about the original code, sorry. In retrospect I realize that wasn't clear. – Nic Jul 13 '19 at 03:35
@PeterCordes You can type-pun with `*reinterpret_cast(&some_float)`, which is valid syntax but undefined behavior. Stroustrup’s C++ guidelines actually recommend this over type-punning through a `union`, on the ground that this is a huge red flag that the program has undefined behavior, whereas it’s hard to spot that the object being read from a `union` is not its active member. The supported method is of course `memcpy()`, but C++20 will introduce `std::bit_cast`. – Davislor Jul 13 '19 at 12:35
@Davislor: yeah, that's why I didn't mention it. Only MSVC, or `gcc -fno-strict-aliasing`, define the behaviour of that kind of pointer-casting. GNU C++ and ISO C99 define the behaviour of union type-punning, but yeah it's more syntax. I would never recommend pointer-cast type punning ever. It's always strict-aliasing UB. – Peter Cordes Jul 13 '19 at 12:39
2

@Peter Just for NaN, `x != x` is true. [See live on coliru](http://coliru.stacked-crooked.com/a/e36c8cbc7095e9b4). In C too. – Deduplicator Jul 13 '19 at 19:00
On a system where `FLT_EVAL_METHOD` is positive, would the `int` be truncated to the precision of `float`, or the precision of the type used for floating-point operations? – supercat Jul 13 '19 at 23:10
2

@Deduplicator: yes, thanks, I fixed that mistake from my earlier comment when writing my answer. I tested myself on Godbolt with compile-time-constant NAN. I wish I could edit old comments :P I was thinking that unordered meant that any predicate was false, but apparently `!=` works like `!(x == x)` instead of being its own positive assertion. – Peter Cordes Jul 14 '19 at 02:28

Peter Cordes · Accepted Answer · 2019-07-18T19:32:05.167

As @Angew pointed out, the != operator needs the same type on both sides. (float)i != i results in promotion of the RHS to float as well, so we have (float)i != (float)i.

g++ also generates an infinite loop, but it doesn't optimize away the work from inside it. You can see it converts int->float with cvtsi2ss and does ucomiss xmm0,xmm0 to compare (float)i with itself. (That was your first clue that your C++ source doesn't mean what you thought it did like @Angew's answer explains.)

x != x is only true when it's "unordered" because x was NaN. (INFINITY compares equal to itself in IEEE math, but NaN doesn't. NAN == NAN is false, NAN != NAN is true).

gcc7.4 and older correctly optimizes your code to jnp as the loop branch (https://godbolt.org/z/fyOhW1) : keep looping as long as the operands to x != x weren't NaN. (gcc8 and later also checks je to a break out of the loop, failing to optimize based on the fact that it will always be true for any non-NaN input). x86 FP compares set PF on unordered.

And BTW, that means clang's optimization is also safe: it just has to CSE (float)i != (implicit conversion to float)i as being the same, and prove that i -> float is never NaN for the possible range of int.

(Although given that this loop will hit signed-overflow UB, it's allowed to emit literally any asm it wants, including a ud2 illegal instruction, or an empty infinite loop regardless of what the loop body actually was.) But ignoring the signed-overflow UB, this optimization is still 100% legal.

GCC fails to optimize away the loop body even with -fwrapv to make signed-integer overflow well-defined (as 2's complement wraparound). https://godbolt.org/z/t9A8t_

Even enabling -fno-trapping-math doesn't help. (GCC's default is unfortunately to enable
-ftrapping-math even though GCC's implementation of it is broken/buggy.) int->float conversion can cause an FP inexact exception (for numbers too large to be represented exactly), so with exceptions possibly unmasked it's reasonable not to optimize away the loop body. (Because converting 16777217 to float could have an observable side-effect if the inexact exception is unmasked.)

But with -O3 -fwrapv -fno-trapping-math, it's 100% missed optimization not to compile this to an empty infinite loop. Without #pragma STDC FENV_ACCESS ON, the state of the sticky flags that record masked FP exceptions is not an observable side-effect of the code. No int->float conversion can result in NaN, so x != x can't be true.

These compilers are all optimizing for C++ implementations that use IEEE 754 single-precision (binary32) float and 32-bit int.

The bugfixed (int)(float)i != i loop would have UB on C++ implementations with narrow 16-bit int and/or wider float, because you'd hit signed-integer overflow UB before reaching the first integer that wasn't exactly representable as a float.

But UB under a different set of implementation-defined choices doesn't have any negative consequences when compiling for an implementation like gcc or clang with the x86-64 System V ABI.

BTW, you could statically calculate the result of this loop from FLT_RADIX and FLT_MANT_DIG, defined in <climits>. Or at least you can in theory, if float actually fits the model of an IEEE float rather than some other kind of real-number representation like a Posit / unum.

I'm not sure how much the ISO C++ standard nails down about float behaviour and whether a format that wasn't based on fixed-width exponent and significand fields would be standards compliant.

In comments:

@geza I would be interested to hear the resulting number!

@nada: it's 16777216

Are you claiming you got this loop to print / return 16777216?

Update: since that comment has been deleted, I think not. Probably the OP is just quoting the float before the first integer that can't be exactly represented as a 32-bit float. https://en.wikipedia.org/wiki/Single-precision_floating-point_format#Precision_limits_on_integer_values i.e. what they were hoping to verify with this buggy code.

The bugfixed version would of course print 16777217, the first integer that's not exactly representable, rather than the value before that.

(All the higher float values are exact integers, but they're multiples of 2, then 4, then 8, etc. for exponent values higher than the significand width. Many higher integer values can be represented, but 1 unit in the last place (of the significand) is greater than 1 so they're not contiguous integers. The largest finite float is just below 2^128, which is too large for even int64_t.)

If any compiler did exit the original loop and print that, it would be a compiler bug.

@SombreroChicken: no, I learned electronics first (from some textbooks my dad had lying around; he was a physics professor), then digital logic and got into CPUs/software after that. :P So pretty much I've always liked understanding things from the ground up, or if I start with a higher level then I like to learn at least something about the level below that influences how/why things work in the level I'm thinking about. (e.g. how asm works and how to optimize it is influenced by CPU design constraints / cpu-architecture stuff. Which in turn comes from physics + math.) — Peter Cordes, Jul 13 '19 at 11:34
GCC might not be able to optimize even with `frapw`, but I'm sure that GCC 10's [`-ffinite-loops`](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-ffinite-loops) was designed for situations like this. — MCCCS, Jul 13 '19 at 15:04

Is this floating-point optimization allowed?

2 Answers2