If the authors of the C++ Standard had intended that side-effect-free endless loops be treated as UB, the most natural and unambiguous way of specifying that would be to specify as a constraint that all side-effect-free loops shall terminate. The Standard does not use such terminology, however.
In general English usage, permission to assume X implies permission to perform some actions which would be unreasonable if X were false, without regard for whether X is true. It does not imply that if X is false, all possible actions one might do should be viewed as equally reasonable. I would argue that the Standard's use of "may assume" is meant to mirror the common English language usage.
Neither the C nor C++ Standard have any way of allowing optimizing transforms to observably affect the behavior of an otherwise-defined program, even in cases where the transforms would merely replace one behavior which satisfies application requirements with another that would also satisfy application requirements. There are at least three sensible ways an implementation could treat side-effect-free loops that might not terminate:
Require that the program behave in a manner consistent with sequential program execution.
Specify that if no individual action within a loop would be sequenced relative to a statically reachable action that follows it, the execution of the loop as a whole may be treated as unsequenced relative to that action.
Require that programmers add dummy side effects to loops which cannot be proven to terminate.
The second of these allows some optimizations that are easy, useful, and safe, since proving that no individual action within loop will have observable side effects is vastly easier than proving that a loop will terminate. The third of these will require programmers to do more work than would be required for the other two, while forcing compilers to at either generate less efficient code than for #1, or at best--if there is an explicit "dummy side effect" that doesn't require generating useless code--generate code that's no more efficient than #1.
Consider the following function:
unsigned char arr[65537];
unsigned test(unsigned x, unsigned mask)
{
unsigned i=1;
while((i & mask) != x)
i*=3;
if (x < 65536)
arr[x] = 1;
return i;
}
If the function is called by code that ignores the return value, the loop would serve no purpose in scenarios where it would terminate, and there is no particular reason that the code as written would care about whether the loop does terminate.
On the other hand, if mask
is e.g. 65535, a compiler could usefully replace the second if
condition with ((i & 65535)==x && (x < 65536))
, and recognize the second part of the condition will always be true any time it's evaluated, and recognize that the first conditional test may be consolidated with test in the loop, and thus omitted as redundant. On the other hand, when using approach #2, the body of the loop would be sequenced before the compiler's added test for (i & 65535)==x
.
Regardless of what the authors of the Standard may have intended, however, the authors of clang and gcc have decided to interpret the Standard as specifying that implementations may treat endless loops as UB, thus allowing for the compiler to omit as redundant both the test for (i & mask)==x
and the test for x < 65536
. For purposes of preventing out-of-bounds array access, each test would be redundant when the other is included, but omission of either test would render the other essential. Clang and gcc, however, eliminate both tests, thus allowing the presence of what should be a side-effect-free loop to cause arbitrary memory corruption.