"The implementation may assume X" implies undefined behavior when X does not hold?

Question

Recently someone tried to convince me that cppreference is wrong when it says that infinite loops without side effects would be undefined behavior. One question related to infinite loops being undefined or not is Infinite loop vs infinite recursion. Are both undefined?. Accepted answer quotes from the standard and concludes: Yes, infinite loops without side effects are undefined behavior.

Other related questions are:

in C: Is an (empty) infinite loop undefined behavior in C?
answer doesnt say it explicitly, but also do not disagree with UB: Is infinite loop still undefined behavior in C++ if it calls shared library?
same quote, same conclusion: Is this infinite recursion UB?, though this answer makes me doubt, it claims that every loop can be assumed to terminate, but infinite loops are not UB.

The relevant quote is ([basic.progress]p1):

The implementation may assume that any thread will eventually do one of the following:

terminate,

make a call to a library I/O function,

perform an access through a volatile glvalue, or

perform a synchronization operation or an atomic operation.

When the standard says "The implementation may assume X", does this wording imply that code that contains "not X" is undefined?

(in this example it would imply that eg for (;;); is undefined, but I am curious about the meaning of the wording in general)

i hope its not too much background for a question that is one sentence only. I hope you will not miss that when considering to answer or vtc — 463035818_is_not_an_ai, Mar 27 '23 at 20:28
_"code that contains "not X" is undefined"_ - If the code does not contain _any_ of the Xes, it has undefined behavior. It's enough that it contains one of the Xes for it to avoid having UB (for the "infinite loop" reason). — Ted Lyngmo, Mar 27 '23 at 20:29
@TedLyngmo thats how I understand it, but someone told me that I am wrong, and I had no better argument then this quote and they said, no thats not UB. I am still somewhat convinced that it is, but not completely ;) — 463035818_is_not_an_ai, Mar 27 '23 at 20:30
While not explicit saying it is UB, if the implementation is allowed to assume your loop will do a thing, and you write a loop that does not, then whatever the implementation decides to do with it is valid, which is kind of the textbook definition of UB — NathanOliver, Mar 27 '23 at 20:31
@Tedlyngmo they said compilers can remove infinite loops but it is not undefined behavior — 463035818_is_not_an_ai, Mar 27 '23 at 20:31
@463035818_is_not_a_number _"compilers can remove infinite loops but it is not undefined behavior"_ - If the implementation specifies exactly how and when such removal will happen, it is free to do so. Without that specification, the removal can happen in one loop, but not another loop looking exactly the same and then I'd say the result is UB. — Ted Lyngmo, Mar 27 '23 at 20:33
As a matter of pure formal logic, any proposition can be deduced from “X and not X”. Therefore, if “not X” is true in the program and “X” may be assumed to be true from the standard, then the compiler may deduce anything from that, including “the program does Y” for an arbitrary Y or “the behavior is not defined.” — Eric Postpischil, Mar 27 '23 at 20:34
@EricPostpischil C++, though, makes a distinction between "not defined" and "undefined behavior", does it not? I believe nasal demons only come from one of those. — Drew Dormann, Mar 27 '23 at 20:37
"Recently someone tried to convince me that cppreference is wrong " -- a lot of people who learned the language by trial and error on one particular platform are in denial about things the standard says , particularly when it comes to undefined behaviour. — M.M, Mar 27 '23 at 21:32
If the compiler completely removes an infinite loop, then the code after that loop will execute. That would qualify as undefined behavior to me. Undefined doesn't always mean unexplainable. — Mark Ransom, Mar 27 '23 at 22:36
It doesn't make sense to remove an infinite loop. It is not a thing. What would you replace the loop with? — curiousguy, Mar 28 '23 at 00:20
@curiousguy optimizing compilers will often remove code that is deemed Undefined Behavior, because Undefined Behavior can do _anything_ and doing nothing is a very fast implementation of doing anything. — Drew Dormann, Mar 28 '23 at 00:28
@DrewDormann Again, it doesn't make sense. You don't just "remove code". How would you do that? What do you replace it with? — curiousguy, Mar 28 '23 at 05:56
@curiousguy there's no need to replace it with anything, just pretend it doesn't exist. For real world examples see https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633 — Mark Ransom, Mar 28 '23 at 17:42
@curiousguy why it makes no sense? It can be removed like dead code can be removed. Though the intersting cases is not actually those that have UB, but when the compiler can use the assumption that there is no UB to gain opportunities for optimization — 463035818_is_not_an_ai, Mar 28 '23 at 17:53
@DrewDormann "distinction between "not defined" and "undefined behavior", does it not? I believe nasal demons only come from one of those." no, simply anything that is not defined by the standard is undefined. The standard never says "if your code has undefined behavior then bad things will happen / nasal demons will appear", its the absense of such statement that make it undefined. Only sometimes it explicitly mentions something to be undefined. However, "only this is defined" vs "not this is undefined" is just a matter of words (as I also had to rediscover by answering my own question ;) — 463035818_is_not_an_ai, Mar 28 '23 at 19:23
@DrewDormann the nasals are just a consequence of compiler implementation details when you break the preconditions — 463035818_is_not_an_ai, Mar 28 '23 at 19:24
@463035818_is_not_a_number I could be wrong, but I think you are overlooking "implementation defined". Such behaviors are neither standard-defined nor UB. — Drew Dormann, Mar 28 '23 at 19:40
@DrewDormann yes you are right, though "implementation defined" means that the implementation must define (and document) it. There is also "unspecified" which is closer to undefined, though it just means that either outcome is possible, like eg when comparing pointers that do not point into the same array via `<` — 463035818_is_not_an_ai, Mar 28 '23 at 19:45
@463035818_is_not_a_number: The Standard has no meaningful way of saying *anything* about situations where useful optimization may result in program behavior which is observably inconsistent with sequential execution, but may still be consistent with application requirements. For example, *cleanly* removing a loop which has a *statically reachable* exit and no observable side effects, without regard for whether the loop would terminate, would make programs more efficient in cases where the loop does terminate, while being unlikely to violate program requirements if it does not. — supercat, Apr 25 '23 at 21:13
@463035818_is_not_a_number: Clang and gcc defenestrate any advantages such optimizations could offer, however, by having code that follows a loop assume that loop exit requirements have been met *without recognizing such reliance as creating a dependency on actions performed within the loop, which should block the omission thereof*. If the only way to ensure that a loop which would be free of outside side effects beyond possibly blocking program execution is to add dummy side effects to prevent such compiler behavior, the loop-based optimization would only be useful on "broken" programs. — supercat, Apr 25 '23 at 21:30

463035818_is_not_an_ai · Answer 1 · 2023-03-27T20:49:22.677

7

Undefined behavior is defined as: intro.defs:

behavior for which this document imposes no requirements

For this question this means: implementations may assume that X always holds. The standard imposes no requirements on implementations when X does not hold (because anyhow it does hold always). Infinite loops without side effects are undefined.

Browsing the standard I found another example of the "may assume" wording. In [priority.queue].1

[...] the library assumes that the function or function object defines a strict weak ordering (28.7).

Using a comparator with a priority queue that does not define a strict weak ordering is undefined.

edited Mar 27 '23 at 20:49

answered Mar 27 '23 at 20:37

463035818_is_not_an_ai

109,796
11
89
185

I think you've come to the correct conclusion. I wish that that _"someone"_ who tried to convince you of the opposite shows up to dispute this so we can hear the full argument. – Ted Lyngmo Mar 27 '23 at 20:47
1

@TedLyngmo Finding another example of where the standard uses the wording finally convinced me, now I can go to sleep without bad dreams about infinite loops, thats what counts :D. And when I meet them again I can point them here to write an answer – 463035818_is_not_an_ai Mar 27 '23 at 20:50
@TedLyngmo: I've offered in an answer an example of why treating the construct as UB is less useful than granting a compiler more *limited* ability to make use of assumptions regarding loop termination. Someone apparently didn't like it, but requiring that programmers avoid various corner cases at all costs has the effect of making it so the only way to prevent phony breaking "optimizations" is to to expend efforts that shouldn't be necessary, to block optimizations that would otherwise have been useful. – supercat Apr 12 '23 at 16:42
@supercat I'm not qualified to vote either way on your ideas but it does seem like you haven't provided a language lawyering answer. I interpret it as improvement ideas. – Ted Lyngmo Apr 12 '23 at 17:17
@TedLyngmo: If the intention of the Standard was to characterize endless loops as UB, the authors could have clearly and unambiguously said that by specifying a constraint that all loops without particular kinds of side effects shall terminate. Only in the minds of C and C++ compiler writers does permission to assume X imply permission to do absolutely anything and everything if X isn't true. In other contexts, it implies permission to do *some things* that would normally require that X be true, without having to confirm that it is. – supercat Apr 12 '23 at 17:26
@supercat That may be. I think I've read something explaining that reasoning a long time ago. Can't remember where though. – Ted Lyngmo Apr 12 '23 at 17:35
@TedLyngmo: The authors of the Standard clearly wanted to allow compilers some freedom to process potentially-endless loops in a manner that would not always be consistent with sequential program execution, but they refrained from specifying a constraint that all loops must terminate. I think the most natural interpretation is that they intended to invite useful optimizing transforms, without implying that programmers must go out of their way to guard against excessively aggressive transforms that would be worse than useless. – supercat Apr 12 '23 at 17:36
@supercat please first consider the wording only. When the standard says "Implementations may assume no foo" does that imply that code with foo has undefined behavior? Thats my question. And then of course I am also curious how it applies to the case of infinite loops. For the example of strict weak ordering I am (almost) certain that a comparator that does not implement a strict weak ordering is UB – 463035818_is_not_an_ai Apr 12 '23 at 17:37
@463035818_is_not_a_number I'd reword the question (in the actual question) to something like: _"does this wording imply that code that does not contain any of the above has undefined behavior?"_ since a thread is only required to do _one_ of the things listed to avoid UB. – Ted Lyngmo Apr 12 '23 at 17:43
@463035818_is_not_a_number: I've added a couple more paragraphs at the start of the answer to address the question from a more language-lawyerly angle. The Standard has a standard phraseology it could have used if the intention was to characterize side-effect-free endless loops as UB. The authors instead used phraseology which, if interpreted according to the normal English language usage of the words involved, would grant more limited permissions, but without explicitly saying what the permissions are. There are many situations where the C or C++ Standards... – supercat Apr 12 '23 at 17:50
...make no effort to anticipate and forbid all of the obtuse ways that constructs could be processed by a compiler that was designed to be as useless as possible while still being "conforming". The fact that the Standard would impose no restrictions on how an obtuse compiler might process a construct which quality compilers process usefully does not imply any judgment that programmers should be required to go out of their way to protect themselves against everything an obtuse compiler might do. – supercat Apr 12 '23 at 18:03
It's too bad that historically, most questions about whether some useful construct X invokes UB weren't met with a reply "The Standard imposes no requirements on how a low-quality-but-conforming implementation would process X. Why--*are you seeking to write one*?" – supercat Apr 12 '23 at 18:07
@463035818_is_not_a_number: Another point to consider when trying to "language lawyer" C or C++ is that the C's definition of "conforming program" includes programs upon whose behavior the Standard imposes no requirements, and C++ *explicitly waives* any concept of conformance for programs that would be allowed to "survive" compilation. – supercat Apr 12 '23 at 18:23
@supercat I appreciate your answer. However, i am afraid before accepting any answer I will have to do more reading and research. Seems like there are two camps, at least [cppreference::ub](https://en.cppreference.com/w/cpp/language/ub) is rather clear, several q&as here agree, and you aren't the first one i see to contradict – 463035818_is_not_an_ai Apr 12 '23 at 19:23
@463035818_is_not_a_number: The C++99 standard and every version since have been a bit hand-wavy in spots where compilers writers were expected to make reasonable efforts to uphold the Principle of Least Astonishment, but were expected to better placed than the Committee to judge what behaviors their users would find "astonishing". While the Standard does not impose any limits on how a compiler might exploit an assumption that side-effect-free loops will terminate, the use of "may assume that X happens" versus "X shall happen" language would suggest to me an intention... – supercat Apr 12 '23 at 19:43
...to let compiler writers exercise judgment over how to balance optimization with the Principle of Least Astonishment, rather than inviting compiler writers to throw the POLA out the window. – supercat Apr 12 '23 at 20:05
@463035818_is_not_a_number: I've added to my answer an example of a situation where clang and gcc would sometimes process an endless loop that is free of side effects as written in a manner that would cause memory corruption, but where a more limited deviation from sequential program execution would be more useful. – supercat Apr 13 '23 at 16:56
If an implementation documents anything about its priority queue implementation, and such documentation would only be consistent with its behaving in a certain ways when given a particular comparison function, the implementation should be expected to behave in a manner consistent with that documentation even if the comparison function is non-transitive. Depending upon the task to be performed, it may be more useful for an implementation to guarantee that no comparison behavior could result in an out-of-bounds array access, or for an implementation to prioritize speed at the risk... – supercat Apr 13 '23 at 17:46
...of having non-transitive comparisons result in erroneous array index computations. I suspect individual members of the Commitee may have had opinions about which behavior was better, but there was neither a consensus to suggest that implementations should do some minimal array-bounds checking to limit the effect of non-transitive comparisons, nor a consensus to suggest that implementations should forego bounds checking so as to maximize performance. – supercat Apr 13 '23 at 17:52
@supercat after work my capacity is limited. Forgive me if I'll need some time to digest the comments ;) – 463035818_is_not_an_ai Apr 13 '23 at 17:53
@463035818_is_not_a_number: Another observation is that if older parts of the Standard were being written today, I think many would have used "may assume" language to indicate parts that were meant to convey a "hand-wavy" implication of "may assume... *when doing so would be useful and sensible*". People making a bona fide effort to make their compilers useful for programmers shouldn't need to be explicitly told not to make absurd assumptions that would interfere with what programmers needed to do. – supercat Apr 13 '23 at 18:47

supercat · Answer 2 · 2023-04-13T16:48:15.613

If the authors of the C++ Standard had intended that side-effect-free endless loops be treated as UB, the most natural and unambiguous way of specifying that would be to specify as a constraint that all side-effect-free loops shall terminate. The Standard does not use such terminology, however.

In general English usage, permission to assume X implies permission to perform some actions which would be unreasonable if X were false, without regard for whether X is true. It does not imply that if X is false, all possible actions one might do should be viewed as equally reasonable. I would argue that the Standard's use of "may assume" is meant to mirror the common English language usage.

Neither the C nor C++ Standard have any way of allowing optimizing transforms to observably affect the behavior of an otherwise-defined program, even in cases where the transforms would merely replace one behavior which satisfies application requirements with another that would also satisfy application requirements. There are at least three sensible ways an implementation could treat side-effect-free loops that might not terminate:

Require that the program behave in a manner consistent with sequential program execution.
Specify that if no individual action within a loop would be sequenced relative to a statically reachable action that follows it, the execution of the loop as a whole may be treated as unsequenced relative to that action.
Require that programmers add dummy side effects to loops which cannot be proven to terminate.

The second of these allows some optimizations that are easy, useful, and safe, since proving that no individual action within loop will have observable side effects is vastly easier than proving that a loop will terminate. The third of these will require programmers to do more work than would be required for the other two, while forcing compilers to at either generate less efficient code than for #1, or at best--if there is an explicit "dummy side effect" that doesn't require generating useless code--generate code that's no more efficient than #1.

Consider the following function:

unsigned char arr[65537];
unsigned test(unsigned x, unsigned mask)
{
  unsigned i=1;
  while((i & mask) != x)
    i*=3;
  if (x < 65536)
    arr[x] = 1;
  return i;
}

If the function is called by code that ignores the return value, the loop would serve no purpose in scenarios where it would terminate, and there is no particular reason that the code as written would care about whether the loop does terminate.

On the other hand, if mask is e.g. 65535, a compiler could usefully replace the second if condition with ((i & 65535)==x && (x < 65536)), and recognize the second part of the condition will always be true any time it's evaluated, and recognize that the first conditional test may be consolidated with test in the loop, and thus omitted as redundant. On the other hand, when using approach #2, the body of the loop would be sequenced before the compiler's added test for (i & 65535)==x.

Regardless of what the authors of the Standard may have intended, however, the authors of clang and gcc have decided to interpret the Standard as specifying that implementations may treat endless loops as UB, thus allowing for the compiler to omit as redundant both the test for (i & mask)==x and the test for x < 65536. For purposes of preventing out-of-bounds array access, each test would be redundant when the other is included, but omission of either test would render the other essential. Clang and gcc, however, eliminate both tests, thus allowing the presence of what should be a side-effect-free loop to cause arbitrary memory corruption.

nice example. So you are saying, the intention of the authors was not that infinite loops are ub, but compiler writers interpereted it as ub and make code produce bogus results (results that cannot be explained by an infinite loop being ignored alone). right? — 463035818_is_not_an_ai, Apr 13 '23 at 17:26
@463035818_is_not_a_number: Precisely. Under the clang/gcc interpretation, it's possible for a piece of code which would, taken in isolation, have no side effects other than possibly blocking execution indefinitely, to arbitrarily disrupt the behavior of surrounding code. This also happens with their interpretation of `restrict` when processing C code. A program may have defined behavior if function that accepts an `int` argument and returns an `int` computes its return value in any manner that could yield a non-zero value when the input is zero, or that could yield zero when... — supercat, Apr 13 '23 at 17:34
...the input is 1, but yield UB if passing a 0 or 1 would always result in the function yielding the value that was passed in. — supercat, Apr 13 '23 at 17:36
ok. Could you briefly comment on the comparator thingy? Refer to the quote in my answer, does it imply that formally using a comparator that is not a strict weak ordering is UB? or does it merely cause bogus ordering of the priority queue? In practice i suppose its the latter, because the compiler can impossibly check if a comparator is a strict weak ordering for all possible input — 463035818_is_not_an_ai, Apr 13 '23 at 17:38

"The implementation may assume X" implies undefined behavior when X does not hold?

2 Answers2