Why is the compiler allowed to optimize away a `std::for_each` loop with in-place modification?

Question

A recent bug hunt of mine boiled down to the compiler effectively removing some lines of code when optimizations are turned on. I would like to understand which parts of the specifications of C++ or the standard library make these optimizations valid.

The following code snippet, also available on Godbolt, illustrates the issue

#include <iostream>
#include <algorithm>
#include <cstdint>

uint32_t incrementWithReference() {
    uint32_t input = 0;

    auto* begin = reinterpret_cast<uint16_t*>(&input);
    auto* end = begin + 2;
    //std::cout << "[DEBUG: begin = " << *begin << "] ";
    std::for_each(begin, end, [](uint16_t &n){ ++n; });

    return input;
}

uint32_t incrementWithCopy() {
    uint32_t input = 0;
    uint32_t output;

    auto* begin = reinterpret_cast<uint16_t*>(&input);
    auto* end = begin + 2;
    auto* dest = reinterpret_cast<uint16_t*>(&output);
    std::transform(begin, end, dest, [](auto n) { return ++n;});

    return output;
}

int main() {
   std::cout << "withReference: " << incrementWithReference() << std::endl;
   std::cout << "withCopy: " << incrementWithCopy() << std::endl;
}

As can be seen, there are two functions that each interpret a uint32_t as two uint16_t, which are each incremented. The final result is again collected in a uint32_t and returned. The first function implements the incrementation in-place using a call to std::for_each where a reference is passed to the incrementing lambda. The second function uses std::transform instead to write the output to a different uint32_t.

With Clang 15 and -O0, or GCC 12 and -O0 or -O1, the two functions return the expected value of 65537 = 2^16 + 2^0. For higher levels of optimization, lines 8-11 seem to be ignored and the first function returns 0.

Looking for clues as to why such an optimization is permitted, I found on cppreference:for_each that the signature of the function passed should be equivalent to accepting a const &, even though this is not enforced. This could indicate that it is assumed that the function does not mutate its parameters. However, the example further down directly demonstrates mutation.

Can somebody explain the logic behind the compiler ignoring lines 8-11?

Bonus question: Why does the first function still return 65537 with Clang 15 and -O3 if line 10 is uncommented? I can understand that now line 8 can no longer be ignored, but the compiler could still skip line 11.

You're [violating strict aliasing](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule) by treating a `uint32_t` as two `uint16_t` values in the same memory. That is undefined behavior. — Andrew Henle, Feb 08 '23 at 12:35

Why is the compiler allowed to optimize away a `std::for_each` loop with in-place modification?

0 Answers0