11

Although there is plentiful of links about this subject on SO, I think that there is something missing: a clear explanation in plain language of what are the differences between unspecified behavior (UsB), undefined behaviour (UB) and implementation-defined behavior (IDB) with detailed but easy explanation of any use-case and example.

Note: I made the UsB acronym up for sake of compactness in this WIKI, but don't expect to see it used elsewhere.

I know this may seem a duplicate of other posts (the one it comes closer is this), but before anyone marks this as a duplicate, please consider what are the problems with all the material I already found (and I'm going to make a community WIKI out of this post):

  • Too many scattered examples. Examples are not bad, of course, but sometimes one cannot find an example that nicely fits his problem at hand, so they may be confusing (especially for newbies).

  • Examples are often only code with few explanations. On such delicate matters, especially for (relative) newbies, a more top-down approach could be better: first a clear, simple explanation with an abstract (but not legalistic) description, then some simple examples with explanations on why they trigger some behavior.

  • Some posts often sport a mix of C and C++ examples. C and C++ sometimes are not in agreement of what they deem UsB, UB and IDB, so an example can be misleading for someone not proficient in both languages.

  • When a definition of UsB, UB, and IDB is given, usually it is a plain citation of the standards, which sometimes may be unclear or too difficult to digest for newbies.

  • Sometimes the citation of the standards are partial. Many posts cite the standard only for the parts that are useful for the problem at hand, which is good, but lacks generality. Moreover citation of the standards often is not accompanied by any explanation (bad for beginners).

Since I am not a super-expert on this subject myself, I will make a community WIKI so that anyone interested can contribute and improve the answer.

In order not to spoil my purpose to create a structured beginner-friendly WIKI, I'd like the posters to follow a couple of simple guidelines when editing the WIKI:

  • Categorize your use case. Try to put your example/code under an already existing category, if applicable, otherwise create a new one.

  • First the plain-words description. First describe with simple words (without oversimplifying, of course - quality first!) the example or the point you are trying to make. Then put code samples or citations.

  • Cite the standards by reference. Don't post snippets of various standards, but give clear references (e.g C99 WG14/N... section 1.4.7, paragraph ...) and post a link to the relevant resource, if possible.

  • Prefer free online resources. If you want to cite books or non-freely available resources that's ok (and may improve the quality of the WIKI), but try to add also some links to free resources. This is really important especially for ISO standards. You are welcome to add links to official standards, but try to add an equivalent link to freely available drafts as well. And please don't replace links to drafts with references to official standards, add to them. Even some Computer Science departments in some universities don't have a copy of the ISO standard(s), let alone most programmers at large!

  • Don't post code unless really necessary. Post code only if an explanation using only plain English would be awkward or unclear. Try to limit code samples to one-liners. Post links to other SO Q&A instead.

  • Don't post C++ examples. I'd like this to become a sort of FAQ for C (If someone wants to start a twin-thread for C++ that would be great, though). Relevant differences with C++ are welcome, but only as side-notes. That is after you explain the C case thoroughly you may add a couple of statements about C++ if this would help a C programmer when switching to C++, but I wouldn't want to see examples with more than, say, 20% C++ stuff. Usually a simple note like "(C++ behaves differently in this case)" plus a relevant link should be enough.

Since I'm fairly new to SO I hope I'm not breaking any rule by starting a Q&A this way. Sorry if this is the case. The mods are welcome to let me know about it.

Community
  • 1
  • 1
  • 1
    This seems a little broad for SO. However, relevant to your question, I once stumbled upon this list which is interesting: https://www.securecoding.cert.org/confluence/display/seccode/CC.+Undefined+Behavior – asveikau Aug 24 '13 at 17:45
  • So you have diagnosed a lack of three definitions that would take ten lines at most (if they are indeed missing. I seem to have seen them several times), and your solution to that problem is three screens of, for lack of a better phrase, terms of services about how to edit an entry that you have opened on a site that is not really a wiki? And the reason you introduce acronyms that nobody else uses is to make it simpler for beginners, too? – Pascal Cuoq Aug 24 '13 at 18:14
  • @Pascal Please reread my post. I **never** said I diagnosed a lack of definitions tout-court. I tried to explain that I found too many sources of information on the subject, but they lacked organization and sometimes the definitions were a bit too technical for newbies. As for the fact this is not a WIKI, well I may have misunderstood the feature, but when I see the term "WIKI" I think it is to be used as a WIKI, and I wrote a WIKI posts. If SO wiki is not really a wiki feature, well sorry for the misunderstanding, but anyone then can propose to close this post. – LorenzoDonati4Ukraine-OnStrike Aug 25 '13 at 00:55
  • @Pascal as for the UsB acronym, yes, I think it is useful. I never tried to pretend it was standard (I put a note upfront about that). Acronyms are for abbreviating things where it is useful to abbreviate. If this wiki-post will have some success (otherwise I wouldn't have wasted time in writing it - I don't even gain rep for it - I wrote it thinking it was useful), I suppose the term "unspecified behavior" will be used a lot, so using UsB will save a lot of boilerplate terminology. – LorenzoDonati4Ukraine-OnStrike Aug 25 '13 at 00:59
  • 1
    I like the effort you put into this, although I am not sure this whole thing is very fitting for SO. I'd rather see this as kind of an extern resource, for example a blog post, that people can link to on questions where the need arises. – Andreas Grapentin Aug 31 '13 at 19:29
  • Well, I keep seeing questions about this over and over on SO. Mind, not specific questions, but questions (and answers) that show lack of understanding of the basics (which are not easy for the beginner). I'm sorry if this seems too broad. I've searched also on meta-SO before posting, and although there were some critical posts, there were not clear rules as to use the wiki. As I said answering another comment, anyone could vote to close this (I'll not take it personally ;-) if they find it really OT. OTOH it seems that some users find it useful, since I keep getting some positive feedback. – LorenzoDonati4Ukraine-OnStrike Aug 31 '13 at 19:54
  • Is there a better tag we can find or use for implementation-specific behavior? Abbreviations in tags make them very hard to find. The tag is useless in the current state. – Charles Sep 22 '13 at 21:10
  • @Charles You are right. I tried several options trying to avoid the 25 char limit for tags, but to no avail. The option "IDB" is equally problematic. This notwithstanding, I don't think it is *completely* useless, but I agree that it is far than ideal in its state. Anyway it is better than nothing. Maybe in the future the limit will be raised, thus a fully-spelled synonym could be created. Suggestions are welcome. – LorenzoDonati4Ukraine-OnStrike Sep 22 '13 at 21:27

2 Answers2

13

C standards define UsB, UB and IDB in a way that can be summarized as follows:

Unspecified Behavior (UsB)

This is a behavior for which the standard gives some alternatives among which the implementation must choose, but it doesn't mandate how and when the choice is to be made. In other words, the implementation must accept user code triggering that behavior without erroring out and must comply with one of the alternatives given by the standard.

Be aware that the implementation is not required to document anything about the choices made. These choices may also be non-deterministic or dependent (in an undocumented way) on compiler options.

To summarize: the standard gives some possibilities among which to choose, the implementation chooses when and how the specific alternative is selected and applied.

Note that the standard may provide a really large number of alternatives. The typical example is the initial value of local variables that are not explicitly initialized. The standard says that this value is unspecified as long as it is a valid value for the variable's data type.

To be more specific consider an int variable: an implementation is free to choose any int value, and this choice can be completely random, non-deterministic or be at the mercy of the whims of the implementation, which is not required to document anything about it. As long as the implementation stays within the limits stated by the standard this is ok and the user cannot complain.

Undefined Behavior (UB)

As the naming indicates this is a situation in which the C standard doesn't impose or guarantee what the program would or should do. All bets are off. Such a situation:

  • renders a program either erroneous or nonportable

  • doesn't require absolutely anything from the implementation

This is a really nasty situation: as long as there is a piece of code that has undefined behavior, the entire program is considered erroneous and the implementation is allowed by the standard to do everything.

In other words, the presence of a cause of UB allows the implementation to completely ignore the standard, as long as the program triggering the UB is concerned.

Note that the actual behavior in this case may cover an unlimited range of possibilities, the following is by no means an exhaustive list:

  • A compile-time error may be issued.
  • A run-time error may be issued.
  • The problem is completely ignored (and this may lead to program bugs).
  • The compiler silently throws the UB-code away as an optimization.
  • Your hard disk may be formatted.
  • Your computer may erase your bank account and ask your girlfriend for a date.

I hope the last two (half-serious) items can give you the right gut-feeling about the nastiness of UB. And even though most implementations will not insert the necessary code to format you hard drive, real compilers do optimize!

Terminology Note: Sometimes people argue that some piece of code which the standard deems a source of UB in their implementation/system/environment work in a documented way, therefore it cannot be really UB. This reasoning is wrong, but it is a common (and somewhat understandable) misunderstanding: when the term UB (and also UsB and IDB) is used in a C context it is meant as a technical term whose precise meaning is defined by the standard(s). In particular the word "undefined" loses its everyday meaning. Therefore it doesn't make sense to show examples where erroneous or nonportable programs produce "well-defined" behavior as counterexamples. If you try, you really miss the point. UB means that you lose all the guarantees of the standard. If your implementation provides an extension then your guarantees are only those of your implementation. If you use that extension your program is no more a conforming C program (in a sense, it is no more a C program, since it doesn't follow the standard any longer!).

Usefulness of undefined behavior

A common question about UB is something on these lines: "If UB is so nasty, why does not the standard mandate that an implementation issues an error when faced with UB?"

First, optimizations. Allowing implementations not to check for possible causes of UB allows lots of optimizations that make a C program extremely efficient. This is one of the features of C, although it makes C a source of many pitfalls for beginners.

Second, the existence of UB in the standards allows a conforming implementation to provide extensions to C without being deemed non-conforming as a whole.

As long as an implementation behaves as mandated for a conforming program, it is itself conforming, although it may provide non-standard facilities that may be useful on specific platforms. Of course the programs using those facilities will be nonportable and will rely on documented UB, i.e. behavior that is UB according to the standard, but that an implementation documents as an extension.

Implementation-defined Behavior (IDB)

This is a behavior that can be described in a way similar to UsB: the standard provides some alternatives and the implementation choose one, but the implementation is required to document exactly how the choice is made.

This means that a user reading her compiler's documentation must be given enough information to predict exactly what will happen in the specific case.

Note that an implementation that doesn't fully document an IDB cannot be deemed conforming. A conforming implementation must document exactly what happens in any case that the standard declares IDB.



Examples of unspecified behavior

Order of evaluation

Function arguments

The order of evaluation for function arguments is unspecified EXP30-C.

For instance, in c(a(), b()); it is unspecified whether the function a is called before or after b. The only guarantee is that both are called before the c function.



Examples of undefined behavior

Pointers

Dereferencing of null pointer

Null pointers are used to signal that a pointer does not point to valid memory. As such, it does not make much sense to try to read or write to memory via a null pointer.

Technically, this is undefined behaviour. However, since this is a very common source of bugs, most C-environments ensure that most attempts to dereference a null pointer will immediately crash the program (usually killing it with a segmentation fault). This guard is not perfect due to the pointer arithmetic involved in references to arrays and/or structures, so even with modern tools, dereferencing a null pointer may format your hard drive.

Dereferencing of uninitialized pointer

Just like null pointers, dereferencing a pointer before explitely setting its value is UB. Unlike for null pointers, most environments do not provide any safety net against this sort of error, except that compiler can warn about it. If you compile your code anyway, you'll are likely to experience the whole nastiness of UB.

Dereferencing of invalid pointers

An invalid pointer is a pointer that contains an address that is not within any allocated memory area. Common ways to create invalid pointers is to call free() (after the call, the pointer will be invalid, which is pretty much the point of calling free()), or to use pointer arithmetic to get an address that is beyond the limits of an allocated memory block.

This is the most evil variant of pointer dereferencing UB: There is no safety net, there is no compiler warning, there is just the fact that the code may do anything. And commonly, it does: Most malware attacks use this kind of UB behaviour in programs to make the programs behave as they want them to behave (like installing a trojan, keylogger, encrypting your hard drive etc.). The possibility of a formatted hard drive becomes very real with this kind of UB!

Casting away constness

If we declare an object as const, we give a promise to the compiler that we will never change the value of that object. In many contexts compilers will spot such an invalid modification and shout at us. But if we cast the constness away as in this snippet:

int const a = 42;
...
int* ap0 = &a;      //< error, compiler will tell us
int* ap1 = (int*)&a; //< silences the compiler
...
*ap1 = 43;          //< UB ==> program crash?

the compiler might not be able to track this invalid access, compile the code to an executable and only at run time the invalid access will be detected and lead to a program crash.

category 2

put a title here!

put your explanation here!



Examples of implementation-defined behavior

category 1

put a title here!

put your explanation here!

  • 1
    Unfortunately your example for unspecified behavior is not completely correct. If an uninitialized variable is such that it could have been declared with the `register` keyword, the access to it is UB and not UsB. – Jens Gustedt Aug 24 '13 at 17:05
  • Then I have some objection to the wording you are using. "undefined behavior" is not behavior but the absence of it. It is not "undefined behavior" that makes a program erroneous, but the use of a construct etc in a context were the standard doesn't define a specific behavior. – Jens Gustedt Aug 24 '13 at 17:09
  • @Jens thanks for your comments! As I said I'm not a super-expert and I wasn't aware of that `register` thing (I never had to write C code where that was needed). In the context where I've cited it I needed a quick and dirty example and that seemed appropriate (as I stated: C is full of pitfalls! :-). I'll try to add something to hint about that in that spot. – LorenzoDonati4Ukraine-OnStrike Aug 24 '13 at 17:23
  • @Jens As for your 2nd comment: I'm not completely convinced I said something really wrong. As I stated, this should be a WIKI entry that can be digested also by newbies, and I think what I said was clear. I'll reread the whole thing to see if I can be still clearer without making the wording too heavy on the reader. – LorenzoDonati4Ukraine-OnStrike Aug 24 '13 at 17:25
  • I think the pointer examples for UB are a bit over the top. Wouldn't it be sufficient to say "An example for UB is reading uninitialized memory, because blah blah" – Andreas Grapentin Aug 31 '13 at 19:32
  • Well I leave this to the people who edit the example items. I'm focused to keep the general part clean. I suggested some guidelines in the question that I thought they could make the rest more uniform and structured. It's up to the editor to be kind enough to strike a balance between his/her writing style and the guidelines, so as to keep this answer tidy (hopefully). As I said, anyone is welcome to contribute. @Jens has already edited and improved my initial answer while still keeping the focus on newbies. That's the kind of contribution I hoped for from the beginning! – LorenzoDonati4Ukraine-OnStrike Aug 31 '13 at 20:08
  • @LorenzoDonati: string literal comparison is unspecified behaviour in C. Visit this: http://melpon.org/wandbox/permlink/eWQZOjwHlMZgrg4y but implementation defined behaviour in C++. See also http://stackoverflow.com/questions/3289354/output-difference-in-gcc-and-turbo-c. You can add this in your answer. – Destructor Nov 18 '15 at 05:46
  • This post is very good, but it lacks references. Could you please improve it by providing links to the sources where we can verify if your definition matches the standard? – aviggiano Feb 26 '16 at 14:34
  • This QA is **way** too long already. It should be split I guess... One should talk strictly about **UB, UsB vs IDB"; then for UB one explaining "how undefined the undefined behaviour can be", and then probably links to other specific questions that have Q and then A that says "this construct has UB" – Antti Haapala -- Слава Україні Apr 16 '16 at 20:55
1

N1570 is a draft of the ISO C standard, very close to the official ISO document.

N1256 is an earlier draft, incorporating the C99 standard plus changes from the three Technical Corrigenda.

Annex J has 5 sections, each of which gathers information that's scattered through the rest of the standard:

  • J.1 Unspecified behavior
  • J.2 Undefined behavior
  • J.3 Implementation-defined behavior
  • J.4 Locale-specific behavior
  • J.5 Common extensions
Keith Thompson
  • 254,901
  • 44
  • 429
  • 631