How to explain undefined behavior to know-it-all newbies?

Question

There'a a handful of situations that the C++ standard attributes as undefined behavior. For example if I allocate with new[], then try to free with delete (not delete[]) that's undefined behavior - anything can happen - it might work, it might crash nastily, it might corrupt something silently and plant a timed problem.

It's so problematic to explain this anything can happen part to newbies. They start "proving" that "this works" (because it really works on the C++ implementation they use) and ask "what could possibly be wrong with this"? What concise explanation could I give that would motivate them to just not write such code?

Undefined doesn't mean random, it means implementation dependent. Code that uses undefined behavior isn't portable, but it isn't going behave differently on alternative fridays either. (Despite what you seem to think). Like Mitch says, if you can _show_ them that the code will break then do so. If you can't then maybe you should entertain the idea that you don't understand the language as well as you think you do. — John Knoeller, Feb 10 '10 at 09:39
The usual behaviour of `new[]` then `delete` is a memory leak... no crash, but subtle badness that builds up into a real problem later. So it only 'works' for a pretty low value of 'work'. — Andrew McGregor, Feb 10 '10 at 09:40
It's pretty easy to show that new[] then delete is going to break things. Its a lousy example of _undefined_ behavior because it's broken-ness is actually very consistent from compiler to compiler. — John Knoeller, Feb 10 '10 at 09:45
@John: Wrong. Undefined *does* very often mean inconsistent and seemingly random. Nonportable buy consistent behavior is termed "implementation-defined" or "unspecified" (where the former is documented and the latter is not). Undefined does often mean "will behave completely differently depending on circumstances". Take an out of bounds array access. Will it crash? Maybe, if the array just so happened to be allocated at the end of a memory page. Otherwise, it'll simply read/write past the array. Sometimes, that means unallocated memory, sometimes it means memory that's in use. — jalf, Feb 10 '10 at 09:48
@John Knoeller: On VC++ new[], then delete works just fine for built-in types, so you have to go to lengths to prove that there is a problem. — sharptooth, Feb 10 '10 at 09:48
@Andrew McGregor: Usual wrong behavior is heap corruption, not a leak. http://stackoverflow.com/questions/1913343/how-could-pairing-new-with-delete-possibly-lead-to-memory-leak-only — sharptooth, Feb 10 '10 at 09:50
Bitfields, or evaluation order of function arguments are much better examples of undefined behavior. — John Knoeller, Feb 10 '10 at 09:51
@Jonh Knoeller: What specifically do you mean by those two much better examples? — sharptooth, Feb 10 '10 at 09:53
@Jalf: don't be silly. No newbe is going to insist on their right to write code that runs off the ends of array bounds. That sort of undefined behavior is _not_ what sharptooth is worried about. — John Knoeller, Feb 10 '10 at 09:53
It's possible to actually find compilers that do these things differently in ways that will cause your code to actually notice. — John Knoeller, Feb 10 '10 at 09:55
@John Knoeller: There're no sorts of undefined behavior - it's just UB that can surface in any way, period. The problem is not to show how some specific usage of C++ causes problems, the problem is how to explain that undefined behavior is not a thing one wants in his program. — sharptooth, Feb 10 '10 at 09:57
No, sharptooth, your problem is when you _can't_ show it. When you can show then it's easy to make your case. It's when you can't show it that it's hard. — John Knoeller, Feb 10 '10 at 10:01
I will say that claiming that _anything can happen_ doesn't help you to make your case. If I was your junior, I'd just tune you out when you start saying things that are so obviously not true. — John Knoeller, Feb 10 '10 at 10:04
@John Knoeller: The problem is actually harder. For example, it works fine on all up-to-date versions of all widely used compilers. It might break when the next version of some of these compilers is released. I can't show it today with an example and if I use some unreasonably rare compiler they won't listen - they'll decide I'm nitpicking or plain crazy. — sharptooth, Feb 10 '10 at 10:05
Because you are. If it works on all current compilers then it's more likely to become formally part of the spec then it is to suddenly stop working. — John Knoeller, Feb 10 '10 at 10:08
@John: Evaluation order of function arguments is not undefined. It's unspecified. Bitfields are not undefined either. new[]/delete is undefined, and out of bounds array accesses are undefined. If you're going to argue this, it might be appropriate to actually look up what "undefined behavior" **is** first. — jalf, Feb 10 '10 at 10:38
@John: If you don't know the exact implementation of the compiler, then anything *can* happen. A fully conforming compiler can be written that *does* format your harddrive or send hate mail to your boss whenever it encounters undefined behavior. Unless you know for a fact that your compiler can never do any of those things, then you'd best listen when told that "anything can happen". — jalf, Feb 10 '10 at 10:46
@jalf: Can you say lawsuit? only a fool thinks that the only constraint that compiler writers operate under is the C++ spec. Hell, only a fool expects their compiler to be perfectly standards compliant for that matter. — John Knoeller, Feb 10 '10 at 10:55
It's probably worth taking John Knoeller as a useful example case of the "smartass newbie" who doesn't accept what undefined behavior is (Of course he's not a newbie - I guess he's playing devil's advocate? Or making the point that newbies could just be temporarily humoured?). You can't threaten to flunk him, so there are some subtle differences, but any advice given here for dealing with students that doesn't work on John should be considered imperfect at best :-) — Steve Jessop, Feb 10 '10 at 13:03
@John: can you say "hyperbole?" Obviously U.B won't deliberately format your hard drive - no compiler writer would bother writing the code except as a joke. The real issue is that since the compiler can do anything, you may as well imagine that it will do something bad. Formatting is a humorous exaggeration - it might instead shut down, losing your client millions of dollars. So, what lawsuit? I don't fancy your chances when you sue the compiler vendor because the program shut down when you executed undefined behavior, and you lost the contract forever. Do you? — Steve Jessop, Feb 10 '10 at 13:14
@john: Yes, I can say lawsuit. But I fail to see the relevance. Some of us are trying to discuss the actual question posed by the OP. I'm sorry if you fail to see the relevance of what undefined behavior *is*, in a discussion of how to explain to students what undefined behavior is. Sure, you could sue the compiler vendor if their compiler does something unpleasant, but since the compiler never promised to do what your code asked it to, it's hardly a sure win. And whether you win or lose, you're still left with broken code. Sometimes, a lawsuit doesn't solve the problem. — jalf, Feb 10 '10 at 13:22
@John: Only a fool expects the compiler to behave sensibly beyond what is guaranteed by the C++ spec. Indeed, only a fool is able to turn "the compiler doesn't follow the C++ spec entirely in every detail", which is a perfectly valid and reasonable premise, into a claim that "therefore it is ok to rely on things that neither the compiler nor the C++ spec guarantee", which is what you're doing. And which is complete and utter nonsense. — jalf, Feb 10 '10 at 13:23
Even though I know you're asking how to explain to someone rather than an explanation yourself, isn't this still a dupe of any question asking for explanation of undefined behavior? The how-to-explain distinction doesn't seem enough, combined with the fact this question seems more like a rant---and all this even though I share your frustration and agree with your basic premise. (Can I ask how to explain recursion to know-it-all newbies next?) — , Feb 11 '10 at 08:29
@Roger Pate: UB is quite a specific thing - many people just don't believe in it. The chain of comments above proves this better than any words I could possibly say. — sharptooth, Feb 11 '10 at 10:05
@jalf: while I agree with everything you've said, be careful with that last argument. Some compilers guarrantee certain behavior the spec says is UB, such as MSVC when given overlapping ranges to std::copy. — Mooing Duck, May 29 '12 at 14:34

score 52 · Answer 1 · answered Feb 10 '10 at 09:09

52

Undefined means explicitly unreliable. Software should be reliable. You shouldn't have to say much else.

A frozen pond is a good example of an undefined walking surface. Just because you make it across once doesn't mean you should add the shortcut to your paper route, especially if you're planning for the four seasons.

answered Feb 10 '10 at 09:09

Alex Neth

3,326
2
26
36

7

+1 for imagination, sometimes a picture is worth a thousand words. – Matthieu M. Feb 11 '10 at 08:00

score 34 · Answer 2 · answered Feb 10 '10 at 08:57

34

Two possibilities come to my mind:

You could ask them "just because you can drive on the motorway the opposite direction at midnight and survive, would you do it regularly?"
The more involved solution might be to set up a different compiler / run environment to show them how it fails spectacularly under different circumstances.

answered Feb 10 '10 at 08:57

Péter Török

114,404
31
268
329

4

The difficulty with solution 2 is that you may find that none of the compilers available actually fails spectacularly, but the behaviour is still not defined and therefore cannot be depended upon. Too much code is written with "I cannot find somewhere that it breaks", and while that may currently be the case, something could change (platform, compiler, ...) that will make it break in the future - and there is no bug in the system because the behaviour is undefined (so anything is correct). Also search for 'nasal demons' in the comp.std.c news group. – Jonathan Leffler Feb 10 '10 at 20:56

score 23 · Accepted Answer · answered Feb 10 '10 at 08:57

23

"Congratulations, you've defined the behavior that compiler has for that operation. I'll expect the report on the behavior that the other 200 compilers that exist in the world exhibit to be on my desk by 10 AM tomorrow. Don't disappoint me now, your future looks promising!"

answered Feb 10 '10 at 08:57

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

21

The bigger problem is that the same compiler might exhibit a different behavior when compiled with slightly different source or options (or even at a different time of day or on a different machine). So for undefined behavior you can't even really say 'that's what that particular compiler does' - that's what 'implementation specified' behavior is for. – Michael Burr Feb 10 '10 at 09:01
Where does the myth that "what's what implementation specified behavior is for' come from? According to the authors of the Standard, "Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. *It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior*." Note that when C89 was ratified, many actions that weren't officially defined by the Standard were processed 100% consistently by implementations with certain traits... – supercat Feb 06 '20 at 20:09
...that would basically leave them no wiggle room in their behavior, a fact observed and even in at least one case somewhat relied upon by the authors of the Standard. – supercat Feb 06 '20 at 20:16

score 13 · Answer 4 · answered Feb 10 '10 at 11:03

Simply quote from the standard. If they can't accept that, they aren't C++ programmers. Would Christians deny the bible? ;-)

1.9 Program execution

The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. [...]
Certain aspects and operations of the abstract machine are described in this International Standard as implementation-defined (for example, sizeof(int)). These constitute the parameters of the abstract machine. Each implementation shall include documentation describing its characteristics and behavior in these respects. [...]
Certain other aspects and operations of the abstract machine are described in this International Standard as unspecified (for example, order of evaluation of arguments to a function). Where possible, this International Standard defines a set of allowable behaviors. These define the nondeterministic aspects of the abstract machine. [...]
Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer). [ Note: this International Standard imposes no requirements on the behavior of programs that contain undefined behavior. —end note ]

You can't get any clearer than that.

score 11 · Answer 5 · answered Feb 10 '10 at 10:01

11

I'd explain that if they didn't write the code correctly, their next performance review would not be a happy one. That's sufficient "motivation" for most people.

answered Feb 10 '10 at 10:01

score 3 · Answer 6 · answered May 29 '12 at 14:24

Turn the person into a pointer. Tell them that they are a pointer to a class human and you are invoking the function 'RemoveCoat'. When they are pointing at a person and saying 'RemoveCoat' all is fine. If the person does not have a coat, no worries - we check for that, all RemoveCoat really does is remove the top layer of clothing (with decency checks).

Now what happens if they are pointing somewhere random and they say RemoveCoat - if they are pointing at a wall then the paint might peel off, if they are pointing at a tree the bark might come off, dogs might shave themselves, the USS Enterprise might lower its shields at a critical moment etc!

There is no way of working out what might happen the behaviour has not been defined for that situation - this is called undefined behaviour and must be avoided.

score 3 · Answer 7 · answered Feb 10 '10 at 08:56

3

Let them try their way until their code will crash during test. Then the words won't be needed.

The thing is that newbies (we've all been there) have some amount of ego and self-confidence. It's okay. In fact, you couldn't be a programmer if you didn't. It's important to educate them but no less important to support them and don't cut their start in the journey by undermining their trust in themselves. Just be polite but prove your position with facts not with words. Only facts and evidence will work.

answered Feb 10 '10 at 08:56

2

Yikes! Too much undefined behavior just happens to work for whatever they might want to do (they screw around with the code until "it works on my machine") - until they've moved on somewhere else and the tools are updated. Suddenly the bad aspects of the undefined behavior blow up in the face of whatever chump is still around maintaining the stuff. – Michael Burr Feb 10 '10 at 09:05
2

@Michael: I still think that "Developer Art" has a good point. There's no reason to freak out when discussing undefined behavior -- it's bad enough to be discouraged on the merits of the case. It's important to explain calmly and clearly why it's bad. The goal is to discourage the behavior, not the person. – JXG Feb 10 '10 at 10:26
@Michael: well, if this was a newbie in a professional setting, submitting code that I thought could blow up in future, then it wouldn't pass code review. If they continue to submit code that they know won't pass review, because they think they know better than their boss how to write C++, that's an issue for the gentle end of the disciplinary procedure. If they're actually a student, and not writing code that has to be maintained in future, then tasking them to port their crappy code to a compiler on which it breaks might be enough to illustrate the point - deal with their own fallout. – Steve Jessop Feb 10 '10 at 13:09
@Steve: not all code shops have code reviews (disregard the why of that) so in those cases it's important to have all committers write code in a reasonably safe manner. Also, I've had jobs with newbie coders nearby who coded away for a year or more on their own project, without code review (with approval from above). Their ways are set then, they tend to do quirky things and it's extremely hard to change their attitude and basically, whatever talent they had is mostly wasted. (They can still code, and make stuff work, but they will never be true professionals...) – Macke Feb 10 '10 at 20:58
True. Even if a company doesn't review all code, or regularly review all coders, it should certainly review code from new coders, and keep reviewing it until they get it more or less right (by whatever standards the company holds - I'm assuming the company's guidelines support sharptooth's expectation that "works for me" isn't good enough). New hires are on probation, surely? Code review can fall by the wayside, but you need to have some grounds for confidence in the coder in the first place, so if their code continually introduces risks for the future, you want to know about it. – Steve Jessop Feb 10 '10 at 22:26

score 3 · Answer 8 · answered Feb 10 '10 at 10:23

John Woods:

In short, you can't use sizeof() on a structure whose elements haven't been defined, and if you do, demons may fly out of your nose.

"Demons may fly out of your nose" simply must be part of the vocabulary of every programmer.

More to the point, talk about portability. Explain how programs frequently have to be ported to different OSes, let alone different compilers. In the real world, the ports are usually done by people other than the original programmers. Some of these ports are even to embedded devices, where there can be enormous costs of discovering that the compiler decided differently from your assumption.

score 2 · Answer 9 · answered Feb 10 '10 at 09:05

Quietly override new, new[], delete and delete[] and see how long it takes him to notice ;)

Failing that ... just tell him he is wrong and point him towards the C++ spec. Oh yeah .. and next time be more careful when employing people to make sure you avoid a-holes!

score 2 · Answer 10 · answered Feb 10 '10 at 09:12

2

One would be...

"This" usage is not part of the language. If we would say that in this case the compiler must generate code that crashes, then it would be a feature, some kind of requirement for the compiler's manufacturer. The writers of the standard did not wanted to give unnecessary work on "features" that are not supported. They decided not to make any behavioral requirements in such cases.

answered Feb 10 '10 at 09:12

Notinlist

16,144
10
57
99

From the C99 Rationale [but I think the quote goes back to C89]: "Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior". C++ described Undefined Behavior in much the same way as C, so I think the intended meaning was similar even though compilers prefer to use UB as an excuse to go off the deep end. – supercat Feb 15 '20 at 20:48

Manuel · Answer 11 · 2010-02-10T10:07:00.417

2

I like this quote:

Undefined behavior: it may corrupt your files, format your disk or send hate mail to your boss.

I don't know who to attribute this to (maybe it's from Effective C++)?

edited Feb 10 '10 at 10:07

answered Feb 10 '10 at 10:00

Manuel

12,749
1
27
35

2

but if their argument is "look, I just compiled and ran the program and it worked", why would your quote change their minds? – jalf Feb 10 '10 at 10:29
@jalf - well you simply have to replace "your" with "our client's" in the quote – Manuel Feb 10 '10 at 10:36
But again, they just *showed* that it worked. Who cares what some dusty old quote says? Why shouldn't they trust that if it worked when they ran the program, it's *always* going to work? – jalf Feb 10 '10 at 10:43
They didn't show that it works in the computers of all potential clients, nor that it works if compiled by another compiler or even by a previous/future version of the same compiler. If they can prove *that*, then I'd guess I'd just rest my case – Manuel Feb 10 '10 at 11:05

score 2 · Answer 12 · answered Feb 10 '10 at 10:44

C++ is not really a language for dilletantes, and simply listing out some rules and making them obey without question will make for some terrible programmers; most of the stupidest things I see people say are probably related to this kind of blind rules following/lawyering.

On the other hand if they know the destructors won't get called, and possibly some other problems, then they will take care to avoid it. And more importantly, have some chance to debug it if they ever do it by accident, and also to have some chance to realize how dangerous many of the features of C++ can be.

Since there's many things to worry about, no single course or book is ever going to make someone master C++ or probably even become that good with it.

score 1 · Answer 13 · answered Feb 10 '10 at 09:52

1

Just show them Valgrind.

answered Feb 10 '10 at 09:52

Piotr Justyna

4,888
3
25
40

Johan Kotlinski · Answer 14 · 2010-02-10T11:07:58.577

Compile and run this program:

#include <iostream>

class A {
    public:
            A() { std::cout << "hi" << std::endl; }
            ~A() { std::cout << "bye" << std::endl; }
};

int main() {
    A* a1 = new A[10];
    delete a1;

    A* a2 = new A[10];
    delete[] a2;
}

At least when using GCC, it shows that the destructor only gets called for one of the elements when doing single delete.

About single delete on POD arrays. Point them to a C++ FAQ or have them run their code through cppcheck.

score 1 · Answer 15 · answered Nov 20 '11 at 19:46

One point not yet mentioned about undefined behavior is that if performing some operation would result in undefined behavior, a standards-conforming implementation could legitimately, perhaps in an effort to be 'helpful' or improve efficiency, generate code which would fail if such an operation were attempted. For example, one can imagine a multi-processor architecture in which any memory location may be locked, and attempting to access a locked location (except to unlock it) will stall until such time as the location in question was unlocked. If the locking and unlocking were very cheap (plausible if they're implemented in hardware) such an architecture could be handy in some multi-threading scenarios, since implementing x++ as (atomically read and lock x; add one to read value; atomically unlock and write x) would ensure that if two threads both performed x++ simultaneously, the result would be to add two to x. Provided programs are written to avoid undefined behavior, such an architecture might ease the design of reliable multi-threaded code without requiring big clunky memory barriers. Unfortunately, a statement like *x++ = *y++; could cause deadlock if x and y were both references to the same storage location and the compiler attempted to pipeline the code as t1 = read-and-lock x; t2 = read-and-lock y; read t3=*t1; write *t2=t3; t1++; t2++; unlock-and-write x=t1; write-and-unlock y=t2;. While the compiler could avoid deadlock by refraining from interleaving the various operations, doing so might impede efficiency.

score 0 · Answer 16 · answered Feb 10 '10 at 09:06

Turn on malloc_debug and delete an array of objects with destructors. freeing a pointer inside the block should fail. Call them all together and demonstrate this.

You'll need to think of other examples to build your credibility until they understand that they are newbies and there's a lot to know about C++.

score 0 · Answer 17 · answered Feb 10 '10 at 09:08

0

Tell them about standards and how tools are developed to comply with the standards. Anything outside the standard might or might not work, which is UB.

answered Feb 10 '10 at 09:08

piotr

5,657
1
35
60

score 0 · Answer 18 · answered Feb 10 '10 at 12:51

Just because their program appears to work is a guarantee of nothing; the compiler could generate code that happens to work (how do you even define "work" when the correct behavior is undefined?) on weekdays but formats your disk on weekends. Did they read the source code to their compiler? Examine their disassembled output?

Or remind them just because it happens to "work" today is no guarantee of it working when you upgrade your compiler version. Tell them to have fun finding whatever subtle bugs creep up from that.

And really, why not? They should be providing a justifiable argument to use undefined behavior, not the other way around. What reason is there to use delete instead of delete[] other than laziness? (Okay, there's std::auto_ptr. But if you're using std::auto_ptr with a new[]-allocated array, you probably ought to be using a std::vector anyway.)

In situations where parts of the Standard and an implementation's documentation together describe a behavior even though some other part of the Standard says it's "undefined", one would describe "working" as "behaving as described, even though the Standard doesn't require it'. The extremely vast majority of situations where people complain about how compilers process UB involve actions whose behavior is described, but which are then classified as UB anyway. — supercat, Oct 26 '19 at 17:51

score 0 · Answer 19 · answered Feb 15 '20 at 21:01

Both the C and C++ Standards use the term "Undefined Behavior" to refer to situations in which it may be useful for different implementations to process constructs in differing, incompatible, fashions, some of which will behave predictably but some of which may not. Both use the same terminology to describe UB, and while I don't know of any published Rationale for C++ Standards, the Rationale for the C Standard says:

Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior."

Note that many actions which were classified as Undefined Behavior by the C Standard were considered fully defined on many if not all implementations, but the authors of the Standard wanted to give implementors targeting unusual platforms or application fields the ability to deviate from the normal behaviors if doing so would benefit their customers. Such freedom was not intended to invite arbitrary and capricious deviations from precedent that make it harder for programmers to quickly and easily do what needed to be done.

Unfortunately, many programmers who use gcc and clang don't understand their needs as well as the maintainers of those compilers, who recognize that that since the Standard avoids mandating anything that would impair the efficiency of applications that will never receive maliciously-crafted inputs, or will only run in contexts where even malicious programs would be unable to damage anything, that implies that there's no need for any implementations to allow programmers to easily and efficiently write programs that are suitable for use in other contexts.

How to explain undefined behavior to know-it-all newbies?

19 Answers19

Linked

Related