4

I'm having trouble understanding the difference between unspecified and undefined behavior. I think trying to understand some examples would be useful. For instance, x = x++. The problem with this assignment is that:

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.

This violates a shall rule, but does not explicitly invoke undefined behavior, but it involves UB according to:

The order of evaluation of the operands is unspecified. If an attempt is made to modify the result of an assignment operator or to access it after the next sequence point, the behavior is undefined.

Assuming none of these rules existed and there are no other rules that "invalidate" x = x++. The value of x would then be unspecified, right?

The doubt arised because sometimes it is argued that things in C are UB by "default" are only valid you can justify that the construction is valid.

Edit: As pointed out by P.W, there is a somewhat related, well-received, version of this question for C++: What made i = i++ + 1; legal in C++17?.

Joseph Quinsey
  • 9,553
  • 10
  • 54
  • 77
jinawee
  • 492
  • 5
  • 16
  • The duplicate is too broad, I don't believe it should be used for closing this question, which is very specific. I will re-open. – Lundin Jan 11 '19 at 14:12
  • 4
    "Undefined" means *anything* can happen, or nothing. It might cause demons to fly out of your nose, as one of the common aphorisms goes. "Unspecified" means that there are multiple alternatives, but which specific one is exercised is not determined by the standard. – John Bollinger Jan 11 '19 at 14:15
  • @usr To be honest, this always left me puzzled. With `x = x++`, it is unclear if `x` is assigned the "old" or the "new" value (so unspecified), but that's the only ambiguity I can imagine. What else could happen there making "flying demons", or making the program crash? – glglgl Jan 11 '19 at 14:23
  • @usr If it were left unspecified, if x is previously 1, then 1 or 2 after the assignment, right? Tbh, my question arose because some people say that constructions are UB unless the standard says otherwise. But I'm thinking things are always unspecified by default. – jinawee Jan 11 '19 at 14:27
  • 3
    The C# language gives it defined behavior. It does so in an uncomplicated way, takes only 5 short lines of text. The obvious thing happens when you use it, team members stop talking to you or your git pull requests get ignored. – Hans Passant Jan 11 '19 at 14:31
  • @HansPassant So C++17 wins since they gave it defined behavior with just 2 lines? :) – Lundin Jan 11 '19 at 14:36
  • @Lundin C# defines a whole host of other things in those 5 short lines. It's a draw. – Caleth Jan 11 '19 at 14:41
  • 2
    @glglgl: C was designed for use on a huge range of computing equipment, including some very primitive machines. In some C implementations, `long` or even `int` types might be implemented by constructing them from narrower integers. For these implementations, `x++` is not a single operation. It may require incrementing one byte, testing for carry, and so on. So, in `x = x++;`, the `++` is not necessarily before or after the `=`. It could be partly before and partly after. The C committee decided not to deal with this and just say the behavior is not defined. – Eric Postpischil Jan 11 '19 at 14:44
  • @EricPostpischil Even this I can understand. The value which results can be any garbage. But the effects of this should just affect the value (having unspecified result) instead of affecting the whole program (having undefined behaviour). – glglgl Jan 11 '19 at 14:49
  • @glglgl If the result is some garbage value, it might be a trap representation leading to UB. I don't know if this is the only reason, but this might be at least one. – Ctx Jan 11 '19 at 14:57
  • @Ctx Oh, that makes sense. Thank you. – glglgl Jan 11 '19 at 15:05
  • @glglgl One example of UB would be `int x=0; x = x++; if(x==2){ do_stuff(); }`, after which the compiler is free to assume "aha, but x is never 2 so I can remove this whole code". And it is equally free to replace it all with `do_stuff();`. Or anything else it might fancy. – Lundin Jan 11 '19 at 15:15
  • 1
    @Lundin this reads like examples for unspecified behaviour, not for undefined behaviour. – Ctx Jan 11 '19 at 15:29
  • @Ctx No, since there's no telling what value x might actually have or what the programmer intended. – Lundin Jan 12 '19 at 19:34
  • @glglgl: Situations where it might be impractical for some implementations to process code predictably are classified as UB, without regard for whether most implementations would process such code usefully, or even the fact that previous versions of the Standard had usefully defined the behavior of the code. The assumption is that people seeking to write quality compilers will recognize that they should process such code usefully when practical, without relying upon the authors of the Standard to tell them that. – supercat Jul 18 '19 at 22:53
  • Assuming pre-17, for that particular case, undefined behavior would probably means that the compiler would simply ignore the fact that `i` on both side is the same variable. For the statement `j = i++ + 1`, there are many ways to sequence operation that give the same final result. But when the variable is the same, those ways no longer give the same result. The compiler might even select which one to use depending on things like available registers... For statement `i = i++ + 1`, it would not make much sense to make it unspecified behavior. Either the result is known or it should not be used. – Phil1970 Aug 20 '22 at 21:42

5 Answers5

3

I'm having trouble understanding the difference between unspecified and undefined behavior.

Then let's start with the definitions of those terms from the Standard:


undefined behavior behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

EXAMPLE An example of undefined behavior is the behavior on integer overflow.

(C2011, 3.4.3)


unspecified behavior use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance

EXAMPLE An example of unspecified behavior is the order in which the arguments to a function are evaluated.

(C2011, 3.4.4)


You remark that

The doubt arised because sometimes it is argued that things in C are UB by "default" are only valid you can justify that the construction is valid.

It is perhaps over-aggrandizing that to call it an argument, as if there were some doubt about its validity. In truth, it reflects explicit language from the standard:

If a ''shall'' or ''shall not'' requirement that appears outside of a constraint or runtime- constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words ''undefined behavior'' or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe ''behavior that is undefined''.

(C2011, 4/2; emphasis added)

When you posit

Assuming none of these rules existed and there are no other rules that "invalidate" x = x++.

, that doesn't necessarily change anything. In particular, removing the explicit rule that the order of evaluation of the operands is unspecified does not make the order specified. I'd be inclined to argue that the order remains unspecified, but the alternative is that the behavior would be undefined. The primary purpose served by explicitly saying it's unspecified is to sidestep that question.

The rule explicitly declaring UB when an object is modified twice between sequence points is a little less clear, but falls in the same boat. One could argue that the standard still did not define behavior for your example case, leaving it undefined. I think that's a bit more of a stretch, but that's exactly why it is useful to have an explicit rule, one way or the other. It would be possible to define behavior for your case -- Java does, for example -- but C chooses not to do, for a variety of technical and historical reasons.

The value of x would then be unspecified, right?

That's not entirely clear.

Please understand, too, that the various provisions of the standard for the most part do not stand alone. They are designed to work together, as a (mostly) coherent whole. Removing or altering random provisions has considerable risk of producing inconsistencies or gaps, leaving it difficult to reason about the result.

John Bollinger
  • 160,171
  • 8
  • 81
  • 157
  • Would then "unspecified behavior" have to be explicitly mentioned? I have understood that in "other behavior where this International Standard provides two or more possibilities ", meant that the rules of the Standard lead to several possible behaviors. But I'm starting to think that the options must be explicitly stated. – jinawee Jan 11 '19 at 15:01
  • @jinawee, Unspecified behavior requires that "**[the] Standard** provides two or more possibilities". There is room for argument here, but I do think that's a stronger condition than "two or more possibilities exist." I do not think the magic words "it is unspecified" necessarily need to appear, as opposed to other wording that conveys the same idea, but I do think the standard needs to set out specific alternatives and explicitly leave the choice among them to implementations. – John Bollinger Jan 11 '19 at 17:21
  • Thanks. Things seem more coherent if when there are several ways the abstract state machine can evolve and the Standard doesn't say anything, it's UB instead of unspecified behavior. That would mean that a function call like `f( g1(), g2() )` would be UB had the Standard not said that the order of evaluation of arguments is unspecified. Which is quite reasonable. – jinawee Jan 11 '19 at 19:11
  • @jinawee: A key point which is made clear in the published Rationale for the Standard is that characterizing an action as UB merely means that compilers that have a good reason to process it any particular way may do so; it in no way denies the existence of a commonplace behavior (which the authors of the Standard would refer to as a "popular extension"). Support for such "popular extensions" is recognized as a quality of implementation issue, since questions of whether an implementation has a "good reason" for doing something would be best resolved by people who work with it. – supercat Jul 18 '19 at 23:05
2

Modern C11/C17 has changed the text, but it has pretty much the same meaning. C17 6.5/2:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.

There are several slightly different issues here, mixed into one:

  • Between sequence points, x is written to (side effect) more than once. This is UB as per the above.
  • Between sequence points, the expression contains at least one side effect and there is a value computation of the same variable not related to which value to be stored. This is also UB as per the above.
  • In the expression x = x++, the evaluation of the operand x is not sequenced in relation to the operand x++. The evaluation order is unspecified behavior as per C17 6.5.16.

    The side effect of updating the stored value of the left operand is sequenced after the value computations of the left and right operands. The evaluations of the operands are unsequenced.

If not for the first cited part labelling this UB, then we still wouldn't know if the x++ would be sequenced before or after the evaluation of the left x operand, so it is hard to reason about how this could become "just unspecified behavior".

C++17 actually fixed this part, making it well-defined there, unlike in C or earlier C++ versions. They did so by defining the sequence order (C++17 8.5.18):

In all cases, the assignment is sequenced after the value computation of the right and left operands, and before the value computation of the assignment expression. The right operand is sequenced before the left operand.

I don't see how there can be any middle-ground here; either the expression is undefined or it is well-defined.


Unspecified behavior is deterministic behavior which we cannot know or assume anything about. But unlike undefined behavior, it won't cause crashes and random program behavior. A good example is a() + b(). We can't know which function that will be executed first - the program doesn't even have to be consistent if the same line appears later on in the same program. But we can know that both functions will be executed, one before the other.

Unlike x = a() + b() + x++; which is undefined behavior and we can't assume anything about it. One, both or none of the functions might be executed, in any order. The program might crash, produce incorrect results, produce seemingly correct results or do nothing at all.

Lundin
  • 195,001
  • 40
  • 254
  • 396
2

There have been instances in other programming languages when a previously undefined behavior has become defined in a later standard. One instance I can remember is in C++ where what was undefined behavior in C++11 became well defined in C++17.

i = i++ + 1; // the behavior is undefined in C++11 

i = i++ + 1; // the behavior is well-defined in C++17. The value of i is incremented

There has been a well received question on this topic. What made this well defined is a guarantee in the C++17 standard that

The right operand is sequenced before the left operand.

So in a sense, it is upto the standards committee people to change the standard and provide strong guarantees to make it well defined.

But I do not think that something as simple as x = x++; will be made unspecified. It's will either be undefined or well-defined.

P.W
  • 26,289
  • 6
  • 39
  • 76
2

The problem seems that it cannot be properly defined what i= i++; would mean:

Interpretation 1:

    int i1= i;
    int i2= i1+1;
    i = i2;
    i = i1;

In this interpretation the value of i is retrieved and 1 is added (i2), then this i2 is saved to i but the original i in i1 is further used in the assignment (because here the ++ is interpreted to apply to the value after it has been used) and so i is unchanged.

Interpretation 2:

    int i1= i;
    i1= i1+1;
    i= i1;
    int i2= i;
    i= i2;

In this interpretation the i++ is performed first (and modifies i) and now the modified i is retrieved again and used in the assignment (so i has the incremented value).

Interpretation 3:

    int i1= i;
    i = i1;
    int i2= i1+1;
    i= i2;

In this interpretation first the assignment of i to i is executed and then i is incremented.

To me, all these three interpretations are correct, and there could even be a few more interpretations, but they each do something different. Hence the standard could/did not define it and which interpretation a compiler uses is up to the compiler builder and as a result which behavior a compiler exhibits is undefined: undefined behavior.

(A compiler could even generate a jmp toTheMoon instruction or ignore the whole statement.)

Paul Ogilvie
  • 25,048
  • 4
  • 23
  • 41
1

The order of evaluation and application of the side effect of ++ is left unspecified - the language standard does not mandate left-to-right or right-to-left order (for arithmetic operators, anyway). Consider the well-defined expression a = b++ * ++c. The expressions a, b++, and ++c may be evaluated in any order. Similarly, the side effects to b and c may be applied immediately after evaluation, or deferred until just before the next sequence point, or anywhere in between. All that matters is that the result of b * (c+1) is computed before being assigned to a. The following is one perfectly legal evaluation:

tmp <- c + 1;
a = b * tmp;
c <- c + 1
b <- b + 1

So is this:

c <- c + 1
a <- b * c
b <- b + 1

So is this:

tmp1 <- b
b <- b + 1
tmp2 <- c + 1
a <- tmp1 * tmp2
c <- c + 1

What matters is that, no matter what order of evaluation is chosen, you will always get the same result.

x = x++ could be evaluated in either of the following ways, depending on when the side effect is applied:

Option 1         Option 2
--------         --------
tmp <- x         tmp <- x
x <- x + 1       x <- tmp
x <- tmp         x <- x + 1

The problem is that the two methods give different results. Other, completely different methods may be available based on the instruction set that give different results than these two.

The language standard doesn't mandate what to do when an expression gives different results depending on the order in which it is evaluated - it doesn't place any requirements on the compiler or the runtime environment to pick either option. This is what undefined means - literally, the behavior is not defined by the language specification. You will get a result, but it's not guaranteed to be consistent, or the result you would expect.

Undefined does not mean illegal. Nor does it mean your code is guaranteed to crash. It just means that the result is not predictable or guaranteed to be consistent. An implementation doesn't even have to issue a diagnostic saying "hey, dummy, this is a bad idea."

An implementation is free to define and document a behavior left undefined by the standard (such as MSVC defining fflush on input streams). A number of compilers take advantage of certain behaviors being undefined to perform some optimizations. And some compilers do issue warnings for common mistakes like x = x++.

John Bode
  • 119,563
  • 19
  • 122
  • 198
  • My reasoning was: there are two possible options, so this is unspecified behavior and the compiler can choose how to implement it. But it makes much more sense to understand it as: the are two options and the standard doesn't say it's unspecified behavior, so it's UB as default. Of course, it makes more sense that the standard explicitly says its UB so there is no room for confusion. – jinawee Jan 11 '19 at 19:01