24

Consider

void swap(int* a, int* b)
{
    if (a != b){
        *a = *a ^ *b;
        *b = *a ^ *b;
        *a = *a ^ *b;
    }   
}

int main()
{
    int a = 0;
    int b = 1;
    swap(&a, &b); // after this b is 0 and a is 1
    return a > b ? 0 : a / b;
}

swap is an attempt to fool the compiler into not optimising out the program.

Is the behaviour of this program defined? a / b is never reachable, but if it was then you'd get a division by zero.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
  • 4
    My understanding of UB is that it requires that a path that would unavoidably lead to an expression containing UB be reached. That is, as long as there is a chance that the expression with UB isn't reached, there is no UB. Though I can't find a source. Otherwise, common strategies such as checking for `nullptr` before calling a member function would be UB. – François Andrieux Jan 11 '18 at 15:10
  • 2
    @FrançoisAndrieux I would say the UB is when and only when a path to the expression is taken. – Eugene Sh. Jan 11 '18 at 15:12
  • 3
    @You: That's on the C++ tag. I asked this on the C tag as the rules are generally simpler in C. – Bathsheba Jan 11 '18 at 15:12
  • 2
    @You That's different. That question is asking rather introducing UB into a branch is enough to force the compiler to assume the branch is unreachable. – François Andrieux Jan 11 '18 at 15:12
  • I'll grant you that the duplicate is about C++ rather than C, but it does ask exactly the same thing: _does the existence of [undefined behavior] in a given program mean that the whole program is undefined or that behavior only becomes undefined once control flow hits this statement?_ – You Jan 11 '18 at 15:14
  • @You would you post it as a duplicate if this was tagged java? – UKMonkey Jan 11 '18 at 15:15
  • @chux By "no" you mean it is *undefined*? – Eugene Sh. Jan 11 '18 at 15:15
  • 1
    @UKMonkey: No, because Java (as far as I know) doesn't have the exact same definition of "undefined behavior" as C++, whereas C does. – You Jan 11 '18 at 15:19
  • 1
    @You you quoted that UB is defined as "behavior, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately valued objects, for which this International Standard imposes no requirements" I'm not seeing how that means that they'll have the same definition of "erroneous program construct" (since they clearly don't!) – UKMonkey Jan 11 '18 at 15:45
  • @EugeneSh. The "no" refers to title "Is the behaviour ... (bunch of qualifiers) ... undefined?" - the behavior is not undefined. – chux - Reinstate Monica Jan 11 '18 at 16:13
  • @chux OK. Was not sure if it is for the *Is the behaviour of this program defined?* found in the question body... – Eugene Sh. Jan 11 '18 at 16:15
  • @chux: I've edited the question. It doesn't invalidate any of the answers methinks. – Bathsheba Jan 11 '18 at 16:20
  • @You unless I've missed something, the C standard **does not** contain the sentence quoted in the answer for the C++ question you refer to that says specifically: "However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation)." I thus tend to agree with UKMonkey that C and C++ differ on this topic. – Virgile Jan 11 '18 at 17:03
  • Classical counterexample: `size_t size = sizeof( *(int*)0x0);` – edmz Jan 11 '18 at 20:28
  • 2
    @You That C may or may not have the same definition of UB as C++ is part of the answer, I should think. C and C++ *are* separate languages, and they're diverging more as the years pass, from what I understand. Nothing says they must have the same behavior, so I don't think closing as a duplicate of a C++ question is appropriate. – jpmc26 Jan 11 '18 at 23:48
  • Possible dup of [Can code that will never be executed invoke undefined behavior?](https://stackoverflow.com/q/18385020/1275169). – P.P Apr 25 '18 at 14:33

3 Answers3

26

It is not necessary to base a position on this question on the usefulness of any given code construct or practice, nor on anything written about C++, whether in its standard or in another SO answer, no matter how similar C++'s definitions may be. The key thing to consider is C's definition of undefined behavior:

behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

(C2011, 3.4.3/1; emphasis added)

Thus, undefined behavior is triggered temporally ("upon use" of a construct or data), not by mere presence.* It is convenient that this is consistent for undefined behavior arising from data and that arising from program constructs; the standard need not have been consistent there. And as another answer describes, this "upon use" definition is a good design choice, as it allows programs to avoid executing undefined behaviors associated with erroneous data.

On the other hand, if a program does execute undefined behavior then it follows from the standard's definition that the whole behavior of the program is undefined. This consequent undefinedness is a more general kind arising from the fact that the UB associated directly with the erroneous data or construct could, in principle, include altering the behavior of other parts of the program, even retroactively (or apparently so). There are of course extra-lingual limitations on what could happen -- so no, nasal demons will not actually be making any appearances -- but those are not necessarily as strong as one might suppose.


* Caveat: some program constructs are used at translation time. These produce UB in program translation, with the result that every execution of the program has wholly-undefined behavior. For a somewhat stupid example, if your program source does not end with an unescaped newline then the program's behavior is completely undefined (see C2011, 5.1.1.2/1, point 2).

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • Your citation does not convince me. Use of a nonportable program construct could be that the developer added it to the program, and thus used the construct, even if it's never executed. The term `use` is unclear. Is there a wordlist in the standard? – Filip Haglund Jan 12 '18 at 03:54
  • This is an excellent answer, covering not only the strict definition of _undefined behavior_ according to the standard, but also the nuances involved in interpreting its meaning and consequences. – alecov Jan 12 '18 at 04:52
  • @FilipHaglund: Aside from the citation, if you think about it, the alternative is completely nonsensical, at least in the given example of division by zero; If the mere presence of a construct that is undefined behavior would lead to undefined behaviour even when unreachable, you could not use the division operator at all - any attempt at checking for zero would be moot if unreachability wouldn't prevent the undefined behaviour. – Aleksi Torhamo Jan 12 '18 at 05:25
  • @Aleksi good point, now I am convinced :) – Filip Haglund Jan 12 '18 at 11:07
22

The behavior of an expression that is not evaluated is irrelevant to the behavior of a program. Behavior that would be undefined if the expression were evaluated has no bearing on the behavior of the program.

If it did, then this code would be useless:

if (p != NULL)
    …; // Use pointer p.

(Your XORs could have undefined behavior, as they may produce a trap representation. You can defeat optimization for academic examples like this by declaring an object to be volatile. If an object is volatile, the C implementation cannot know whether its value may change due to external means, so each use of the object requires the implementation to read its value.)

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • 5
    Beware that it's not strictly necessary that an expression with UB be reached to cause UB if it can be proven that it would eventually be reached. – François Andrieux Jan 11 '18 at 15:14
  • 1
    @FrançoisAndrieux This is not clear what it means. C programs are usually deterministic. If a path *can* be reached in a specific run (with determined input/time and/or `rand` output), it *will* be reached. – Eugene Sh. Jan 11 '18 at 15:23
  • @FrançoisAndrieux Added this to the comment. I say if you assume a very specific user input format, for instance, even knowing the deviation from it will lead to UB, you can safely tell that there is no UB as long as the format is matching. – Eugene Sh. Jan 11 '18 at 15:25
  • 3
    @EugeneSh. I meant that a program has UB if it can be proven that a path leads to an expression with UB under all conditions. That is, if at a given point it can be proven that all path lead to UB, the program is already UB at that point. A compiler may make that determination and act accordingly, altering behavior before the actual UB expression is reached. – François Andrieux Jan 11 '18 at 15:28
  • @FrançoisAndrieux Can't argue with that. Was under impression you are claiming something else. – Eugene Sh. Jan 11 '18 at 15:29
  • @FrançoisAndrieux: I would be interested in support for that. In the interim, note that the C standard requires that the input and output dynamics of interactive devices shall take place as specified in 7.21.3, and it says the intent is that unbuffered or line-buffered output appears as soon as possible (5.1.2.3 6, describing *observable behavior*). So leaping into arbitrary behavior earlier than the actual evaluation of an undefined expression would seem to go against the spirit of that. – Eric Postpischil Jan 11 '18 at 15:33
  • +1 and accept. This is the kind of answer that reveals the deficiencies in a question: It's a pity that I submitted such a dreadful example. I don't want to edit it since that would spoil this answer. It might be helpful to us all though if you could expand the answer to other cases though. – Bathsheba Jan 11 '18 at 15:33
  • 1
    @Bathsheba: Go ahead and edit the question if you think that would improve it. And edit the answer some if it needs to match. I can review it later. Gotta go now. – Eric Postpischil Jan 11 '18 at 15:35
  • @EricPostpischil You can find examples illustrating this problem by searching for "undefined behavior leads to time travel" or similar titles, because UB can seem to modify behavior that happened before it was reached. Most examples are written in c++, but there are some written in c, such as [this one](http://shape-of-code.coding-guidelines.com/2012/07/12/undefined-behavior-can-travel-back-in-time/). – François Andrieux Jan 11 '18 at 15:39
  • Alternatively to using `volatile` to defeat the optimizer, you can introduce optimizer barriers by splitting the code into several files. If the optimizer cannot see the body of a function, it cannot draw conclusions abouts it behavior; and if the optimizer cannot see the call of a function, it cannot draw conclusions about its arguments. So both call and function will be compiled without knowledge of the other side. This is frequently exactly what you need. You just have to ensure that you do not switch on link time optimizations when you use this approach. – cmaster - reinstate monica Jan 11 '18 at 16:30
  • @cmaster, I would contend that what you need is rather to avoid undefined behaviors. Doing that perfectly can be hard, but avoiding code paths that can be proven at compile time inevitably to lead to UB is easier. Then you don't need to devise ways to disguise the UB, and you have better code to boot. – John Bollinger Jan 11 '18 at 19:32
  • Well, actually my comment was made with benchmarking in mind: When you benchmark stuff, you generally want the optimizer do its work perfectly with the local code, but ignore the hints that tell it that the local code is indeed pointless - when benchmarking, I'm just interested in the runtime, after all, forget the results! This is precisely what the OP wanted to do with the `swap()` as well: Stop the optimizer from optimizing away their code (maybe to look at its assembly). Of course, avoiding UB is the first objective. But it does pay off to know how to stop the optimizer in a sensible way. – cmaster - reinstate monica Jan 11 '18 at 22:10
  • @FrançoisAndrieux: Oddly, that page seems to contradict the answers here; if I've understood correctly, the issue there is that although the relevant code was unreachable in UB-invoking cases, the compiler couldn't tell that, so it made use of the UB. That's . . . something. – ruakh Jan 12 '18 at 04:31
  • 1
    @ruakh In that case, the UB expression is not unreachable The error reporting branch does not return or otherwise prevent control from continuing through the function so control is assumed to eventually reach the UB. Rather the example is saying that the branch could be removed because during optimization because the only case when it could be taken is also a case where UB is certain to be reached. – François Andrieux Jan 12 '18 at 11:58
  • @FrançoisAndrieux: The linked page says, "The finger of blame could be pointed at[] the developers for not **specifying** that the function `ereport` does not return (this would **enable** the compiler to **deduce** that there is no undefined behavior because the divide is never execute when `arg2 == 0`)" [emphasis mine]. To me, that pretty clearly seems to be saying that the problem is that, even though `ereport` *doesn't* return, the compiler didn't *know* that. Am I misinterpreting it? Or is the page mistaken? – ruakh Jan 12 '18 at 19:30
  • @ruakh The compiler needs to prove that `ereport` eventually returns if it wants to optimize it out because of `arg1 / arg2`, otherwise there is a chance that the UB expression won't be reached, which means optimizing it out might alter the behavior of a well defined program. Note that "prove that it eventually returns" implies that it's proven within the confines of the language. For example, as a user, you can terminate a process but the compiler assumes that never happens. I'm not sure how `ereport` is expected to avoid returning, but it's apparently not detected by the compiler. – François Andrieux Jan 12 '18 at 19:38
0

In general, code which would invoke Undefined Behavior if executed must not have any effect if it is not executed. There are, however, a few cases where real-world implementations may behave in contrary fashion and refuse to generate code which, while not a constraint violation, could not possibly execute in defined behavior.

extern struct foo z;

int main(int argc, char **argv)
{
    if (argc > 2) z;
    return 0;
}

By my reading of the Standard, it explicitly characterizes lvalue conversions on incomplete types as invoking Undefined Behavior (among other things, it's unclear what an implementation could generate code for such a thing), so the Standard would impose no requirements upon behavior if argc is 3 or more. I can't identify any constraint in the Standard that the above code would violate, however, nor any reason behavior should not be fully defined if argc is 2 or less. Nonetheless, many compilers including gcc and clang reject the above code entirely.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • At `if (argc > 2) z;`, the compiler is supposed to evaluate `z`, but `z` is a variable of incomplete type, so it cannot be evaluated. The compilers are right to reject the code.\ – Jonathan Leffler Jan 11 '18 at 17:25
  • @JonathanLeffler: The Standard describes the evaluation of an incomplete type as being UB rather than a constraint violation. What it should do IMHO is have a category for constructs which compilers may accept or reject at their leisure without regard for whether they are executed, but must not interfere with operation unless either (1) they cause the program to be rejected, or (2) they are executed. – supercat Jan 11 '18 at 18:40
  • OK; so the compilers are correct to decide that they're going to reject your attempt to invoke UB . because they have no idea what you really mean and they can do what they like when it is UB. I don't see that as a compiler problem; it is bad C code. – Jonathan Leffler Jan 11 '18 at 18:43
  • @JonathanLeffler: Nothing I can see in the Standard would indicate that the aforementioned program would not have defined behavior if invoked with `argc <= 2`. While I would agree that the Standard probably *shouldn't* imply that quality compilers must accept such a program, the only justification I can see for an implementation doing otherwise would be declaring that such a program exceeds some implementation limit--a loophole that would justify doing almost anything with almost *any* program. – supercat Jan 11 '18 at 18:56
  • We'd best cease this discussion here. If you wish, we can remove all the comments. – Jonathan Leffler Jan 11 '18 at 18:57
  • I agree that the behavior of this program is well-defined when argc does not exceed 2, provided that it is linked with a translation unit that provides an external definition of `z`. Inasmuch as there is no completion of type `struct foo` that would cause lvalue conversion of identifier `z` to produce a side effect, I'd even argue that it would be perfectly logical and reasonable for a compiler to accept the program, despite the fact that under some circumstances it produces UB at runtime. – John Bollinger Jan 11 '18 at 19:59
  • I furthermore don't see any justification in the standard for a program's runtime UB of any kind to justify translation failure, but there are a few undefined behaviors that can only plausibly be construed as occurring at translation time, such as occurs when a translation unit's source does not end with a newline. Translation-time UB can certainly manifest as rejecting the program. – John Bollinger Jan 11 '18 at 20:13
  • @JohnBollinger: Perhaps I should have added a declaration of `z` following `main` to avoid any dependency upon anything that might appear in an external compilation unit. I think my key point stands, which is that while it would have been sensible for the Standard to allow implementations to reject such code, it doesn't actually do so [except under the absurd One Program Rule loophole]. – supercat Jan 11 '18 at 20:31
  • @supercat: What is the "One Program Rule"? I'm not finding anything relevant-seeming via Google. – ruakh Mar 25 '18 at 05:20