Clang 14 and 15 apparently optimizing away code that compiles as expected under Clang 13, ICC, GCC, MSVC

Question

I have the following sample code:

inline float successor(float f, bool const check)
{
    const unsigned long int mask = 0x7f800000U;
    unsigned long int i = *(unsigned long int*)&f;

    if (check)
    {
        if ((i & mask) == mask)
            return f;
    }

    i++;

    return *(float*)&i;
}

float next1(float a)
{
    return successor(a, true);
}

float next2(float a)
{
    return successor(a, false);
}

Under x86-64 clang 13.0.1, the code compiles as expected.

Under x86-64 clang 14.0.0 or 15, the output is merely a ret op for next1(float) and next2(float).

Compiler options: -march=x86-64-v3 -O3

The code and output are here: Godbolt.

The successor(float,bool) function is not a no-op.

As a note, the output is as expected under GCC, ICC, and MSVCC. Am I missing something here?

Btw. if you are trying to get the next higher/lower `float` value, there are already standard library functions for that: https://en.cppreference.com/w/cpp/numeric/math/nextafter — user17732522, Sep 25 '22 at 09:46
BTW, there's a standard C function for almost this, [`nextafterf`](https://en.cppreference.com/w/c/numeric/math/nextafter). But it's slower than this because it needs an FP compare to find the direction. You're unconditionally increasing the magnitude, not moving toward a target (typically + or -Inf, or 0). So it could be implemented with an x86-64 `paddd` or `psubd` SIMD integer addition or subtraction for the unchecked case. (`psubd` is useful to take advantage of generating `_mm_set_epi32(-1)` with `pcmpeqd xmm1,xmm1`) — Peter Cordes, Sep 27 '22 at 21:05
@PeterCordes I implemented a vector version using 1) cast to int32, 2) increment, 3) cast to float. Only 4 clocks for the process. — IamIC, Oct 11 '22 at 09:58
Yeah, for sure. It's literally one `vpaddd` instruction to increment the magnitude of an IEEE float, which should have 1 cycle latency itself, but maybe an extra 1 cycle of bypass forwarding latency in and out, if used between two FP math instructions. (https://agner.org/optimize/) But still a throughput of 3/clock on Skylake for example (per vector of 4 or 8 floats.) — Peter Cordes, Oct 11 '22 at 10:03
@PeterCordes absolutely. I was including the finite validation code portion in the 4 clocks. — IamIC, Oct 12 '22 at 13:17

user17732522 · Accepted Answer · 2022-10-03T16:30:48.193

14

*(unsigned long int*)&f is an immediate aliasing violation. f is a float. You are not allowed to access it through a pointer to unsigned long int. (And the same applies to *(float*)&i.)

So the code has undefined behavior and Clang likes to assume that code with undefined behavior is unreachable.

Compile with -fno-strict-aliasing to force Clang to not consider aliasing violations as undefined behavior that cannot happen (although that is probably not sufficient here, see below) or better do not rely on undefined behavior. Instead use either std::bit_cast (since C++20) or std::memcpy to create a copy of f with the new type but same object representation. That way your program will be valid standard C++ and not rely on the -fno-strict-aliasing compiler extension.

(And if you use std::memcpy add a static_assert to verify that unsigned long int and float have the same size. That is not true on all platforms and also not on all common platforms. std::bit_cast has the test built-in.)

As noticed by @CarstenS in the other answer, given that you are (at least on compiler explorer) compiling for the SysV ABI, unsigned long int (64bit) is indeed a different size than float (32bit). Consequently there is much more direct UB in that you are accessing memory out-of-bounds in the initialization of i. And as he also noticed Clang does seem to compile the code as intended when an integer type of matching size is used, even without -fno-strict-aliasing. This does not invalidate what I wrote above in general though.

edited Oct 03 '22 at 16:30

answered Sep 25 '22 at 09:35

user17732522

53,019
2
56
105

Err... I wonder how network functions like `bind` work under this rule because `sockaddr* addr` is sometimes cast from other structure pointers, and it seems inevitable to use `addr->...` to implement it. Or is there any way to suppress strict aliasing rule in specific fields? – o_oTurtle Sep 25 '22 at 11:27
1

@o_oTurtle The aliasing rule applies only to scalar objects (in C++, but not C). However, there is another rule that forbids member access through a non-similar type. The behavior required by POSIX can not be implemented in standard C++ (or C for that matter). A compiler supporting this interface must define some of the undefined behavior or implement the interface with compiler extensions/magic, but they do not need to (and typically don't) do that for access to scalars with "wrong" types. The interface does not require such an access, only member access through "wrong" structure pointers. – user17732522 Sep 25 '22 at 12:42
1

@o_oTurtle For an in-depth explanation of this issue, see e.g. https://stackoverflow.com/questions/42178179/will-casting-around-sockaddr-storage-and-sockaddr-in-break-strict-aliasing. – user17732522 Sep 25 '22 at 12:43
(Addendum: With "standard" C++ or C in the above I mean free of undefined behavior according to the respective ISO standard.) – user17732522 Sep 25 '22 at 12:54
@njuffa: The behavior would also be well-defined on compiler configurations which are designed to be suitable for low-level programming. – supercat Sep 27 '22 at 22:01
@o_oTurtle: The type aliasing rules in the Standard were intended to allow compilers to make optimizations that might affect some corner-case behaviors, *in cases where doing so would not interfere with what their customers were trying to do*. Because different programs would need to do different things, and because anyone wanting to sell compilers would refrain from gratuitously breaking their customers' code, the Committee waived jurisdiction over when implementations intended for various purposes should support constructs that would facilitate them. – supercat Sep 27 '22 at 22:06
"[...] add a `static_assert` to verify that `unsigned long int` and `float` have the same size. That is not true on all platforms and also not on all common platforms." In particular it is not true on the target platform in the example, which is the reason for clang's behaviour. I think clang has actually improved here, but I would have like a warning accompanying the change in behaviour. – Carsten S Oct 03 '22 at 15:50
@CarstenS Right, I don't know why that slipped past me. Clang also supports the MSVC ABI, in which case the sizes would match, but that is at least not the case on compiler explorer. – user17732522 Oct 03 '22 at 16:20
Well, you read what was intended. I also only noticed after actually trying a $memcpy$ variant and getting a warning (will always overflow). – Carsten S Oct 03 '22 at 16:41

Carsten S · Answer 2 · 2022-10-03T11:55:41.467

2

Standards and UB aside, on your target platform float is 32 bits and long is 64 bits, so I am surprised by the clang 13 code (indeed I think you will get actual UB with -O0). If you use uint32_t instead of long, the problem goes away.

edited Oct 03 '22 at 11:55

answered Oct 03 '22 at 11:49

Carsten S

207
3
14

I believe `long` is 32 bits. See https://www.geeksforgeeks.org/c-data-types/ – IamIC Oct 04 '22 at 09:50
@IamIC, it depends on the target architecture. In your compiler explorer example `long` is 64 bit as you can easily check. – Carsten S Oct 04 '22 at 09:57
From https://learn.microsoft.com/en-us/cpp/cpp/fundamental-types-cpp?view=msvc-170 "In particular, long is 4 bytes even on 64-bit operating systems." – IamIC Oct 04 '22 at 12:31
@IamIC, you have not compiled for Windows with Clang on Compiler Explorer. Also see the addendum to the accepted answer. – Carsten S Oct 04 '22 at 13:06
"Most built-in types have implementation-defined sizes. The following table lists the amount of storage required for built-in types in Microsoft C++." These sizes just do not apply here. – Carsten S Oct 04 '22 at 13:14
I am compiling with Clang and `typedef signed long int int32; static_assert(sizeof(int32) == 4, "sizeof(int32) is expected to be 4");` compiles. – IamIC Oct 06 '22 at 14:59
@IamIC, maybe on your setup, bur not on Compiler Explorer, and it’s the code generated on Compiler Explorer that we discuss. – Carsten S Oct 06 '22 at 15:28
I can't comment about Compiler Explorer. I can only comment about what I'm using, which is what I'm discussing. You are correct in that `long` has a different meaning in different systems. In Visual Studio, which is what I'm using, `long` is 32-bit under C/C++. In C#, it's 64-bit, as typically expected. – IamIC Oct 06 '22 at 15:38
I just checked Compiler Explorer, and `long` is 64-bit, as you stated. Arguably one of the challenges of posting questions here is "how complex does one make them?" I would not use ambiguous types in real code. I was merely trying to understand why the code stopped compiling with a later Clang. – IamIC Oct 06 '22 at 15:42
1

The code *on Compiler Explorer* stopped working with a later version, but it is was wrong (and really wrong, not just wrong by a strict reading of the standard) to begin with. So there is no mystery there, just some change to the optimiser. Now if you actually experienced the same on your setup then you would have to re-investigate. – Carsten S Oct 06 '22 at 15:51
1

I totally agree. Thank you for your input. – IamIC Oct 06 '22 at 15:56

score 1 · Answer 3 · answered Sep 27 '22 at 22:35

Some compiler writers interpret the Standard as deprecating "non-portable or erroneous" program constructs, including constructs which implementations for commonplace hardware had to date had unanimously processed in a manner consistent with implementation-defined behavioral traits such as numeric representations.

Compilers that are designed for paying customers will look at a construct like:

unsigned long int i = *(unsigned long int*)&f; ; f is of type float

and recognize that while converting the address of a float to an unsigned long* is non-portable construct, it was almost certainly written for the purpose of examining the bits of a float type. This is a very different situation from the one offered in the published Rationale as being the reason for the rule, which was more like:

int x;
int test(double *p)
{
  x = 1;
  *p = 2.0;
  return x;
}

In the latter situation, it would be theoretically possible that *p points to or overlaps x, and that the programmer knows what precedes and/or follows x in memory, and the authors of the Standard recognized that having the function unconditionally returned 1 would be incorrect behavior if that were the case, but decided that there was no need to mandate support for such dubious possibilities.

Returning to the original, that represents a completely different situation since any compiler that isn't willfully blind to such things would know that the address being accessed via type unsigned long* was formed from a pointer of type float*. While the Standard wouldn't forbid compilers from being willfully blind to the possibility that a float* might actually hold the address of storage that will be accessed using type float, that's because the Standard saw no need to mandate that compiler writers do things which anyone wanting to sell compilers would do, with or without a mandate.

Probably not coincidentally, the compilers I'm aware of that would require a -fno-strict-aliasing option to usefully process constructs such as yours also require that flag in order to correctly process some constructs whose behavior is unambiguously specified by the Standard. Rather than jumping through hoops to accommodate a deficient compiler configurations, a better course of action would be to simply use the "don't make buggy aliasing optimizations" option.

Clang 14 and 15 apparently optimizing away code that compiles as expected under Clang 13, ICC, GCC, MSVC

3 Answers3