89

The code that invokes undefined behavior (in this example, division by zero) will never get executed, is the program still undefined behavior?

int main(void)
{
    int i;
    if(0)
    {
        i = 1/0;
    }
    return 0;
}

I think it still is undefined behavior, but I can't find any evidence in the standard to support or deny me.

So, any ideas?

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
Yu Hao
  • 119,891
  • 44
  • 235
  • 294
  • 7
    I'd say it's not "behavior" if it's never executed – Kevin Aug 22 '13 at 15:42
  • 1
    If UB is runtime one (like this) - it wouldn't. But i highly doubt standard says anything about this. – keltar Aug 22 '13 at 15:43
  • 1
    "Invokes undefined behavior" - doesn't "invoke" mean it's executed? And if it's not executed, wouldn't mean it doesn't actually "invoke" undefined behavior? – Cornstalks Aug 22 '13 at 15:43
  • It's either non-existent code, (optimized out) or, essentially, a comment. – Martin James Aug 22 '13 at 15:43
  • 13
    Sounds like a question of semantics, not programming. – Wooble Aug 22 '13 at 16:06
  • It's undefined behavior, to have a program which does anything for which the C standard does not specify an outcome. Therefore, a standards-conforming compiler could do anything (or generate code which does anything) from formatting your hard drive to making daemons fly out your nose. There is literally no restriction whatsoever on what a compiler does when faced with undefined behavior. – Lily Chung Aug 22 '13 at 16:19
  • 14
    @Wooble I disagree. The phrase **undefined behavior** has a special meaning in C/C++. And this question is related to some other situations that determines undefined behavior or not. For the record, if you have read the C/C++ standard, you'll find the phrase **undefined behavior** everywhere. – Yu Hao Aug 22 '13 at 16:30
  • 1
    I know what UB is. Asking whether this program contains UB is a semantic question. The behavior of this particular program is well-defined, but it *does* contain code that's UB. – Wooble Aug 22 '13 at 16:36
  • 11
    @Cornstalks: The C standard does not use the phrase “invokes undefined behavior”, so you cannot reason about the C standard based on what this phrase might mean. Using it to describe C is inappropriate because it suggests that “undefined behavior” is a **thing** such as a wall you run into if you go out of bounds. Actually, “undefined behavior” is a lack of a thing; it is the end of boundaries. When you leave the well-defined town that is standard C, you are in an open field where anything can be built. – Eric Postpischil Aug 22 '13 at 16:37
  • 1
    "Code that will never be executed" should be deleted. Regardless of the kind of behavior it invokes. – chharvey Aug 23 '13 at 02:56
  • 1
    Run this code once or a billion times, it will always do the same thing, and the thing it does is very well defined. Also, unless the compiler is eminently stupid (or naive), it will optimize out the branch, so the resulting program will not contain any lines that result in undefined behavior anyhow. – Robert C. Barth Aug 23 '13 at 07:21
  • 1
    @RobertC.Barth Undefined behavior doesn't mean you run a program a billion times, and some may go right, some may go wrong. It means the program may work in one machine/compiler, but it may not in another. As for the code example, of course it's not practical, but I think it's a base question about a bunch of undefined behavior problems, so I think discussing the problem is useful. – Yu Hao Aug 23 '13 at 07:28
  • 1
    @RobertC.Barth In addition to what Yu said UB behaviour can change if you add a line of code even on the same platform (because for example compiler allocated variables in different place and say the 'random memory you wrote on became unmapped or slight change to performing of optimization). And even if you do target single platform and compiler something as simple as [version change](http://blog.regehr.org/archives/918) can break your program. – Maciej Piechotka Aug 23 '13 at 09:19
  • It doesn't matter what platform you run this code on, it's going to do the same thing everywhere. It's totally deterministic. There's no way it's doing anything other than returning zero. The int may or may not get allocated, depending on the compiler, that's the only thing that may change. If you have other bugs elsewhere, the allocation (or lack thereof) of said int may complicate your debugging, but other than that, there's nothing undefined about this program due to the static branch. Now, if I were writing a static analysis tool, I'd mark that code as UB just to help the programmer out. – Robert C. Barth Aug 23 '13 at 19:04
  • I'd also mark it as unreachable. – Robert C. Barth Aug 23 '13 at 19:05
  • @RobertC.Barth It's not up to your tool or any tool or even the compiler what UB is. The standard is the one who determines. If you want to prove I'm wrong, quote the standard saying so, like some of the answers do. – Yu Hao Aug 24 '13 at 00:04
  • I don't need a standard to tell me that the code in question (in the branch) will never run, anywhere, ever, and so, is not undefined, by default. The code has to at least have a chance to run first before its execution can be undefined. That's pretty basic. For a behavior to be defined OR undefined, it first has to execute. If it never executes, then it's nothing -- it doesn't exist (which is why the compiler will optimize it away). – Robert C. Barth Aug 24 '13 at 00:20
  • 2
    @RobertC.Barth I didn't say you are wrong of the conclusion, this simple piece of code probably isn't undefined behavior, as some of the answers say. But you have incorrect definition of what **undefined behavior** in C is, that's all I'm trying to say. For a not so relevant example, if you are doing pointer arithmetic on a pointer that isn't pointing to elements of arrays, it's undefined behavior even if you never dereference the pointer. – Yu Hao Aug 24 '13 at 00:43
  • 3
    @RobertC.Barth You don't seem to be a C guy (from your lack of understanding of the question and from your profile). If your posts were proper answers they'd get downvoted. The question here is whether code with undefined semantics that never gets run still renders the whole program unconforming. I have nothing to support this, but I'd wager it does. – idoby Aug 27 '13 at 20:32
  • @busy_wait, read the top-voted and accepted answer: tl;dr version: it says you're wrong, with decent evidence, so you probably don't want to make that bet. I didn't make my comments answers because they are just that, comments. Feel free to down-vote my comments if it makes you feel good about yourself. – Robert C. Barth Aug 29 '13 at 17:38

9 Answers9

73

Let's look at how the C standard defines the terms "behavior" and "undefined behavior".

References are to the N1570 draft of the ISO C 2011 standard; I'm not aware of any relevant differences in any of the three published ISO C standards (1990, 1999, and 2011).

Section 3.4:

behavior
external appearance or action

Ok, that's a bit vague, but I'd argue that a given statement has no "appearance", and certainly no "action", unless it's actually executed.

Section 3.4.3:

undefined behavior
behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

It says "upon use" of such a construct. The word "use" is not defined by the standard, so we fall back to the common English meaning. A construct is not "used" if it's never executed.

There's a note under that definition:

NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

So a compiler is permitted to reject your program at compile time if its behavior is undefined. But my interpretation of that is that it can do so only if it can prove that every execution of the program will encounter undefined behavior. Which implies, I think, that this:

if (rand() % 2 == 0) {
    i = i / 0;
}

which certainly can have undefined behavior, cannot be rejected at compile time.

As a practical matter, programs have to be able to perform runtime tests to guard against invoking undefined behavior, and the standard has to permit them to do so.

Your example was:

if (0) {
    i = 1/0;
}

which never executes the division by 0. A very common idiom is:

int x, y;
/* set values for x and y */
if (y != 0) {
    x = x / y;
}

The division certainly has undefined behavior if y == 0, but it's never executed if y == 0. The behavior is well defined, and for the same reason that your example is well defined: because the potential undefined behavior can never actually happen.

(Unless INT_MIN < -INT_MAX && x == INT_MIN && y == -1 (yes, integer division can overflow), but that's a separate issue.)

In a comment (since deleted), somebody pointed out that the compiler may evaluate constant expressions at compile time. Which is true, but not relevant in this case, because in the context of

i = 1/0;

1/0 is not a constant expression.

A constant-expression is a syntactic category that reduces to conditional-expression (which excludes assignments and comma expressions). The production constant-expression appears in the grammar only in contexts that actually require a constant expression, such as case labels. So if you write:

switch (...) {
    case 1/0:
    ...
}

then 1/0 is a constant expression -- and one that violates the constraint in 6.6p4: "Each constant expression shall evaluate to a constant that is in the range of representable values for its type.", so a diagnostic is required. But the right hand side of an assignment does not require a constant-expression, merely a conditional-expression, so the constraints on constant expressions don't apply. A compiler can evaluate any expression that it's able to at compile time, but only if the behavior is the same as if it were evaluated during execution (or, in the context of if (0), not evaluated during execution().

(Something that looks exactly like a constant-expression is not necessarily a constant-expression, just as, in x + y * z, the sequence x + y is not an additive-expression because of the context in which it appears.)

Which means the footnote in N1570 section 6.6 that I was going to cite:

Thus, in the following initialization,
static int i = 2 || 1 / 0;
the expression is a valid integer constant expression with value one.

isn't actually relevant to this question.

Finally, there are a few things that are defined to cause undefined behavior that aren't about what happens during execution. Annex J, section 2 of the C standard (again, see the N1570 draft) lists things that cause undefined behavior, gathered from the rest of the standard. Some examples (I don't claim this is an exhaustive list) are:

  • A nonempty source file does not end in a new-line character which is not immediately preceded by a backslash character or ends in a partial preprocessing token or comment
  • Token concatenation produces a character sequence matching the syntax of a universal character name
  • A character not in the basic source character set is encountered in a source file, except in an identifier, a character constant, a string literal, a header name, a comment, or a preprocessing token that is never converted to a token
  • An identifier, comment, string literal, character constant, or header name contains an invalid multibyte character or does not begin and end in the initial shift state
  • The same identifier has both internal and external linkage in the same translation unit

These particular cases are things that a compiler could detect. I think their behavior is undefined because the committee didn't want to, or couldn't, impose the same behavior on all implementations, and defining a range of permitted behaviors just wasn't worth the effort. They don't really fall into the category of "code that will never be executed", but I mention them here for completeness.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • 2
    @EricPostpischil 6.6/4 says "Each constant expression shall evaluate to a constant that is in the range of representable values for its type." Wouldn't that exclude `1/0` from being a constant expression? – Casey Aug 22 '13 at 16:44
  • 2
    @EricPostpischil: I don't think that's quite right. Violating a constraint generally means that a compile-time diagnostic is required, not merely that something that might otherwise be a *foo* is not a *foo*. `1/0` is not a constant expression *in the context of the question* because it's not parsed as a *constant-expression*, merely as a *conditional-expression* that's part of an *assignment-expression*. `case 1/0:` would violate the constraint and require a diagnostic. – Keith Thompson Aug 23 '13 at 01:07
  • DR#109 seems to indicate that the program is not UB, see the new answer I just posted. – ouah Jan 01 '16 at 16:41
  • 2
    Re "*so we fall back to the common English meaning*", In the English meaning, the program uses a construct if it's present in the program. So why does your answer assume using a construct means executing a construct? Your conclusion doesn't follow from your explanation! – ikegami Apr 26 '17 at 18:25
  • @ikegami: Note that the authors of C89 used the phrase "Undefined Behavior" as a catch-all both for constructs which should be viewed as erroneous, and for "non-portable" constructs which would be correct on commonplace implementations but might behave unpredictably on some obscure ones. The fact that a construct invokes UB means that the Standard imposes no requirements upon how implementations behave, but does not imply any judgment as to whether an implementation could be suitable for any particular task without processing the construct usefully *anyhow*. – supercat Jun 14 '22 at 21:51
  • @supercat, No idea why you tagged me with that comment. My comment raised the issue that the answer is self-contradictory, and specifically about things not in the standard. – ikegami Jun 14 '22 at 22:19
  • @ikegami I apparently missed your comment in 2017. "*In the English meaning, the program uses a construct if it's present in the program.*" That's not an unreasonable interpretation, but my interpretation was that a program uses a construct if it uses it during execution. Otherwise, wouldn't `int n = 0; if (0) 1/n;` have undefined behavior? – Keith Thompson Jun 16 '22 at 14:24
  • @ikegami Then I don't think I understand how you're defining "use". Division by zero has undefined behavior. Division by zero is present in a program that includes that code fragment). My understanding is that no undefined behavior occurs because the division by zero does not occur during execution. I'm having trouble understanding your statement that "the program uses a construct if it's present in the program". Perhaps `if (0) 1/0;` makes the point more clearly. – Keith Thompson Jun 16 '22 at 23:29
  • Sorry, missed a lot of the context (since this is a 10 year old Q&A!) Anyway, I'm not debating your conclusion. Neither then nor now. I just pointed out that your premise points to the opposite conclusion. If something is used as the component in the construction of something else, we would say that component is used by the final product. Whether you're making a building a loaf of bread or erecting a building, "use" means "part of". But you are relying on "use" meaning the opposite. – ikegami Jun 17 '22 at 01:19
31

This article discusses this question in section 2.6:

int main(void){
      guard();
      5 / 0;
}

The authors consider that the program is defined when guard() does not terminate. They also find themselves distinguishing notions of “statically undefined” and “dynamically undefined”, e.g.:

The intention behind the standard11 appears to be that, in general, situations are made statically undefined if it is not easy to generate code for them. Only when code can be generated, then the situation can be undefined dynamically.

11) Private correspondence with committee member.

I would recommend looking at the entire article. Taken together, it paints a consistent picture.

The fact that the authors of the article had to discuss the question with a committee member confirms that the standard is currently fuzzy on the answer to your question.

Pascal Cuoq
  • 79,187
  • 7
  • 161
  • 281
  • 1
    That example presents difficulties *only* in whether you can statically determine whether its behavior is undefined. When it runs (assuming the behavior of `guard()` is not undefined), the behavior is undefined if and only if the statement `5 / 0;` is actually executed. (Note that a compiler could legitimately replace the evaluation of `5 / 0` with a call to `abort()` or something similar; the program would then abort if and only if execution reaches that point.) A compiler may *reject* that program only if it can determine that `guard()` will always terminate. – Keith Thompson Aug 22 '13 at 17:44
  • @KeithThompson To clarify the static/dynamic distinction in the article, 5/0 is considered *dynamic* because the compiler can generate code that divides by zero: just generate the usual code that divides by z after having set z to 0. Thus a naive compiler can generate a division instruction. A sophisticated compiler that determines that `guard()` does not terminate does not have to generate any code at all for 5/0. In contrast, there is no way to generate code for `(int)(void)5`, one cannot just generate the code for `(int)(void)z`, since that's not correct either. So the authors think that … – Pascal Cuoq Aug 22 '13 at 18:48
  • @KeithThompson … a compiler is allowed to reject the program `if (0) (int)(void)5;` because of the conundrum it presents to the naive compiler, whereas unreachable dynamic UB such as `if (0) 5 / 0;` is harmless. This is what transpired from their discussion with a committee member and I have seen a similar argument made elsewhere (but perhaps from the same source, especially since I don't remember where it was). I am going through the C99 rationale at the moment, if I see any mention of this I will come back and point it out. – Pascal Cuoq Aug 22 '13 at 18:51
  • 3
    `(int)(void)5` is a constraint violation. N1570 6.5.4, describing the cast operator: "Constraints: Unless the type name specifies a void type, the type name shall specify atomic, qualified, or unqualified scalar type, and the operand shall have scalar type.". `(void)5` does not have scalar type, so `(int)(void)5` violates that constraint, regardless of whether the code containing it is ever executed. – Keith Thompson Aug 22 '13 at 18:59
  • @KeithThompson Yes, they seem to have picked the wrong example, but inside the long list in J.2, there is one that isn't a constraint violation and that is “static”, surely? How about that old classic, “A nonempty source file does not end in a new-line character …”? There is no notion of reachability that applies to this one, but it isn't a constraint violation, is it? – Pascal Cuoq Aug 22 '13 at 20:13
  • That case seems fairly clear-cut, though in a different way. The program's behavior is undefined regardless of what happens during execution; you can't wrap the missing new-line in `if (0)`. It would have been cleaner IMHO to limit the behaviors to (a) treating the file as if it had a trailing new-line, or (b) treating it as a syntax error -- but things get more interesting if it's an included file that's missing the trailing new-line. Some of these cases seem like laziness -- correction, careful resource allocation -- on the part of the committee. – Keith Thompson Aug 22 '13 at 20:26
  • @KeithThompson: Some system's line-input functions may behave weirdly if a file ends with a partial line. If the OS does something weird with a partial line before a C implementation even sees it, a compiler might have no control over that. Further, if an included file were to end with `#define foo` and the include directive were followed by ` bar`, the authors of the Standard may not have wanted to forbid implementations from treating that as `#define foo bar` since some code-generation scripts might rely upon the ability to concatenate lines between files. – supercat Aug 28 '18 at 19:20
6

In this case the undefined behavior is the result of executing the code. So if the code is not executed, there is no undefined behavior.

Non executed code could invoke undefined behavior if the undefined behavior was the result of solely the declaration of the code (e.g. if some case of variable shadowing was undefined).

Arnaud Le Blanc
  • 98,321
  • 23
  • 206
  • 194
3

I think it still is undefined behavior, but I can't find any evidence in the standard to support or deny me.

I think the program does not invoke undefined behavior.

Defect Report #109 addresses a similar question and says:

Furthermore, if every possible execution of a given program would result in undefined behavior, the given program is not strictly conforming. A conforming implementation must not fail to translate a strictly conforming program simply because some possible execution of that program would result in undefined behavior. Because foo might never be called, the example given must be successfully translated by a conforming implementation.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
ouah
  • 142,963
  • 15
  • 272
  • 331
2

I'd go with the last paragraph of this answer: https://stackoverflow.com/a/18384176/694576

... UB is a runtime issue, not a compiletime issue ...

So, no, there is no UB invoked.

Community
  • 1
  • 1
alk
  • 69,737
  • 10
  • 105
  • 255
2

The standard says, as I remember right, it's allowed to do anything from the moment, a rule got broken. Maybe there are some special events with kind of global flavour (but I never heard or read about something like that)... So I would say: No this can't be UB, because as long the behavior is well defined 0 is allways false, so the rule can't get broken on runtime.

dhein
  • 6,431
  • 4
  • 42
  • 74
2

Only when the standard makes breaking changes and your code suddenly is no longer "never gets executed". But I don't see any logical way in which this can cause 'undefined behaviour'. Its not causing anything.

Hrishi
  • 7,110
  • 5
  • 27
  • 26
2

On the subject of undefined behaviour it is often hard to separate the formal aspects from the practical ones. This is the definition of undefined behaviour in the 1989 standard (I don't have a more recent version at hand, but I don't expect this to have changed substantially):

1 undefined behavior
  behavior, upon use of a nonportable or erroneous program construct or of
  erroneous data, for which this International Standard imposes no requirements
2 NOTE Possible undefined behavior ranges from ignoring the situation completely
  with unpredictable results, to behaving during translation or program execution
  in a documented manner characteristic of the environment (with or without the
  issuance of a diagnostic message), to terminating a translation or
  execution (with the issuance of a diagnostic message).

From a formal point of view I'd say your program does invoke undefined behaviour, which means that the standard places no requirement whatsoever on what it will do when run, just because it contains division by zero.

On the other hand, from a practical point of view I'd be surprised to find a compiler that didn't behave as you intuitively expect.

arshajii
  • 127,459
  • 24
  • 238
  • 287
Nicola Musatti
  • 17,834
  • 2
  • 46
  • 55
-1

It depends on how the expression "undefined behavior" is defined, and whether "undefined behavior" of a statement is the same as "undefined behavior" for a program.

This program looks like C, so a deeper analysis of what the C standard used by the compiler (as some answers did) is appropriate.

In absence of a specified standard, the correct answer is "it depends". In some languages, compilers after the first error try to guess what the programmer might mean and still generate some code, according to the compilers guess. In other, more pure languages, once somerthing is undefined, the undefinedness propagate to the whole program.

Other languages have a concept of "bounded errors". For some limited kinds of errors, these languages define how much damage an error can produce. In particular languages with implied garbage collection frequently make a difference whether an error invalidates the typing system or does not.