Is Clang really this smart?

Question

If I compile the following code with Clang 3.3 using -O3 -fno-vectorize I get the same assembly output even if I remove the commented line. The code type puns all possible 32-bit integers to floats and counts the ones in a [0, 1] range. Is Clang's optimizer actually smart enough to realize that 0xFFFFFFFF when punned to float is not in the range [0, 1], so ignore the second call to fn entirely? GCC produces different code when the second call is removed.

#include <limits>
#include <cstring>
#include <cstdint>

template <class TO, class FROM>
inline TO punning_cast(const FROM &input)
{
    TO out;
    std::memcpy(&out, &input, sizeof(TO));
    return out;
}

int main()
{
    uint32_t count = 0;

    auto fn = [&count] (uint32_t x) {
        float f = punning_cast<float>(x);
        if (f >= 0.0f && f <= 1.0f)
            count++;
    };

    for(uint32_t i = 0; i < std::numeric_limits<uint32_t>::max(); ++i)
    {
        fn(i);
    }
    fn(std::numeric_limits<uint32_t>::max()); //removing this changes nothing

    return count;
}

See here: http://goo.gl/YZPw5i

http://stackoverflow.com/questions/23838661/why-is-clang-optimizing-this-code-out — rici, May 29 '14 at 05:33
Clang has a habit of massively optimizing out constant-only functions (effectively doing a quite sophisticated constant folding on them). [Figure 1.](http://stackoverflow.com/questions/15114140/writing-binary-number-system-in-c-code) — The Paramagnetic Croissant, May 29 '14 at 05:40
I think, the main point is how deeply the compiler understand the internal operations of memcpy. — 9dan, May 29 '14 at 05:45
@9dan Nowadays in modern C libraries and compilers, `memcpy` is almost always a compiler intrinsic function. — The Paramagnetic Croissant, May 29 '14 at 05:46
@9dan: It's less "understanding the internals of memcpy" (which might require the compiler to understand the hand-optimized library implementation), and more "understanding the intended function of memcpy". C/C++ allows you to essentially perform any optimization you like, provided the result is unchanged with respect to the specification. Since `memcpy` is specified by C/C++, it can be in principle optimized in any way provided the result is the same. — nneonneo, May 29 '14 at 05:58

score 11 · Accepted Answer · answered May 29 '14 at 05:28

Yes, it looks like Clang really is this smart.

Test:

#include <limits>
#include <cstring>
#include <cstdint>

template <class TO, class FROM>
inline TO punning_cast(const FROM &input)
{
    TO out;
    std::memcpy(&out, &input, sizeof(TO));
    return out;
}

int main()
{
    uint32_t count = 0;

    auto fn = [&count] (uint32_t x) {
        float f = punning_cast<float>(x);
        if (f >= 0.0f && f <= 1.0f)
            count++;
    };

    for(uint32_t i = 0; i < std::numeric_limits<uint32_t>::max(); ++i)
    {
        fn(i);
    }
#ifdef X
    fn(0x3f800000); /* 1.0f */
#endif

    return count;
}

Result:

$ c++ -S -DX -O3 foo.cpp -std=c++11 -o foo.s
$ c++ -S -O3 foo.cpp -std=c++11 -o foo2.s
$ diff foo.s foo2.s
100d99
<   incl    %eax

Observe that Clang has converted the call to fn(0x3f800000) into simply an increment instruction, since the value decodes to 1.0. This is correct.

My guess is that Clang is tracing the function calls because they only involve constants, and that Clang is capable of tracing memcpy through type-punning (probably by simply emulating its effect on the constant value).

In that case I am almost surprised that Clang doesn't compile the whole thing to `movl $1065353217, %eax` — Chris_F, May 29 '14 at 05:40
@Chris_F: I suspect that would take excessively long. Clang likely has some heuristic limits on the amount of tracing it is willing to do (otherwise compile times could easily go through the roof for no clear benefit). — nneonneo, May 29 '14 at 05:45
Ah, that makes a lot of sense now that I try thinking about it. It takes several seconds to run on a fast processor, so if it did this kind of thing everywhere, nothing would ever finish compiling. — Chris_F, May 29 '14 at 05:47

Is Clang really this smart?

1 Answers1

Linked