When the initializer for a const int
is a constant expression (like 0
), the language rules say it becomes constexpr
(thanks @Artyer for pointing this out). So there is a difference in the C++ semantics for const int p = 0;
vs. const int p = foo();
unless you declare constexpr int foo(){...}
, which is probably why compilers optimize them differently in practice.
When the definition of blind()
isn't visible to the optimizer, I think this is still a missed optimization by GCC (and clang, ICC, and MSVC). They could choose to assume that nothing can modify a const
the same way it does assume nothing modifies a constexpr
, because a program that does has undefined behaviour.
When blind()
is in the same compilation unit without __attribute__((noinline,noipa))
, the UB is visible at compile time if optimization is enabled, so all bets are off and no amount of weirdness is particularly surprising.
But with just a prototype for blind()
, compilers have to make asm that would work for a blind()
that didn't have undefined behaviour, so it's interesting to look at what assumptions/optimizations they did make. And to consider whether they'd be allowed to compile the way you expected.
With const int p = 0;
, GCC and clang propagate that constant to later uses of p
in the same function (even with optimization disabled), correctly assuming that nothing else can possibly have changed the value of a const
object. (Not even a debugger, which is something gcc and clang's -O0
default code gen is designed to support for non-const variables; that's one reason why they make separate blocks of asm for each statement which don't keep anything in registers across statements.)
I think it's a missed optimization to not constant-propagate const int p = constant();
in the same case, after inlining constant()
to a constant 0
. It's still a const int
object so it's still UB for anything else to modify it.
Of course that doesn't happen in a debug build; without inlining constant()
they don't know at compile-time what the actual value will be, so they can't use it as an immediate operand for later instructions. So compilers load it from memory at p
's usual address, the same one they passed to blind()
. So they use the modified value in debug builds, that's expected.
In optimized builds, they don't call constant
, they store an immediate 0
to initialize the stack space whose address they pass to blind()
, like we'd expect. But then after the call, they reload it instead of using another immediate 0
. This is the missed optimization.
For a large object, it could be more efficient to use the copy that exists in memory instead of generating it again, especially if passing it to a print function passed by reference. But that's not the case for int
; it is more efficient to just zero a register as an arg passed by value for std::cout::operator<<( int )
than to reload from the stack.
constexpr
changes behaviour (for both debug and optimized)
With constexpr int constant(){ return 0; }
, GCC and clang treat const int p = constant();
exactly the same as const int p = 0;
, because constant()
is a constant expression just like 0
. It gets inlined even with gcc -O0
, and the constant 0
gets used after the call to blind()
, not reloading p
.
Still not an example of code that changes at -O0
vs. -O3
, though.
Apparently it matters to the compiler internals that it was initialized with a "constant expression", whether that's a literal or a constexpr
function return value. But that's not fundamental, it's still UB to modify a const int
no matter how it was initialized.
I'm not sure if compilers are intentionally avoiding this optimization or if it' just a quirk. Maybe not intentionally for this case, but as collateral damage of avoiding some class of things for some reason?
Or perhaps just because for constant-propagation purposes, it's not known until after inlining constant()
that const int p
will have a value that's known at compile time. But with constexpr int constant()
, the compiler can treat the function call as part of a constant expression, so it definitely can assume it will have a known value for all later uses of p
. This explanation seems overly simplistic because normally constant-propagation does work even for things that aren't constexpr
, and GCC/clang transform program logic into SSA form as part of compilation, doing most of the optimization work on that, which should make it easy to see if a value is modified or not.
Maybe when considering passing the address to a function, they don't consider that the underlying object is known to be const
, only whether it was initialized with a constexpr
. If the object in question was only passed or returned by reference to this function, like const int *pptr = foo();
and blind(pptr)
, the underlying object might not be const
, in which case blind()
could modify *pptr
without UB.
I find it surprising that both GCC and clang miss this optimization, but I'm pretty confident that it is actually undefined behaviour for blind()
to modify the pointed-to const int
, even when it's in automatic storage. (Not static where it could actually be in a read-only page and crash in practice.)
I even checked MSVC and ICC 2021 (classic, not LLVM-based), and they're the same as GCC/clang, not constant-propagating across blind()
unless you use a constant expression to init p
, making it a constexpr
. (GCC/clang targeting other ISAs are of course the same; this optimization decision happens in the target-independent middle-end.)
I guess they all just base their optimization choice on whether or not its constexpr
, even though though all 4 of those compilers were independently developed.
To make the asm simpler to look at on the Godbolt compiler explorer, I changed cout<<p
to volatile int sink = p;
to see whether gcc/clang would mov dword ptr [rsp+4], 0
a constant zero, or would load+store to copy from p
's address to sink
. cout << p << '\n'
was simpler, but still messy vs. that.
Seeing constant vs. load+store is the behaviour we're ultimately interested in, so I'd rather see that directly than see a 0 or 1 and have to think through the steps to which I was expecting in which case. You can mouseover the volatile int sink = p;
line and it'll highlight the corresponding instruction(s) in the asm output panes.
I could have just done return p
, especially from a function not called main
so it's not special. In fact that's even easier, makes even simpler asm (but load vs. zero instead of 2 instructions vs. 1). Still, it avoids the fact that GCC implicitly treats main
as __attribute__((cold))
, on the assumption that real programs don't spend most of their time in main
. But the missed optimization is still present in int foo()
.
If you wanted to look at the case where UB is visible at compile time (which I didn't), you could see if it was storing a constant 1
when blind()
was inlined. I expect so.