27

Many bad things happened and continue to happen (or not, who knows, anything can happen) due to undefined behavior. I understand that this was introduced to leave some wiggle-room for compilers to optimize, and maybe also to make C++ easier to port to different platforms and architectures. However the problems caused by undefined behavior seem to be too large to be justified by these arguments. What are other arguments for undefined behavior? If there are none, why does undefined behavior still exist?

Edit To add some motivation for my question: Due to several bad experiences with less C++-crafty co-workers I have gotten used to making my code as safe as possible. Assert every argument, rigorous const-correctness and stuff like that. I try to leave as little room has possible to use my code the wrong way, because experience shows that, if there are loopholes, people will use them, and then they will call me about my code being bad. I consider making my code as safe as possible a good practice. This is why I do not understand why undefined behavior exists. Can someone please give me an example of undefined behavior that cannot be detected at runtime or compile time without considerable overhead?

Björn Pollex
  • 75,346
  • 28
  • 201
  • 283
  • 5
    Undefined behavior seems to be quite the rage these days... Spring time philosophy spirit ? – Matthieu M. May 05 '10 at 09:15
  • 1
    If you replace all the compile time 'undefined behavior' with 'shall not translate' and all the run time 'undefined behavior' with 'calls abort()' or similar, as an application writer you'd still have to avoid whatever construct caused it. If you want to define the behavior to be something less drastic in certain situations then that's no different whether you have UB or not. You have to define (and get everyone else to agree to) behavior in situations where there are currently no requirements on implementations. – CB Bailey May 05 '10 at 09:22
  • @Matthieu M.: This rage that can currently be observed is what inspired this question. – Björn Pollex May 05 '10 at 09:37
  • There's a question about what can be caused by undefined behavior: http://stackoverflow.com/questions/908872/whats-the-worst-example-of-undefined-behaviour-actually-possible Not sure if it's a dupe though. – P Shved May 05 '10 at 09:53
  • Here's a relevant question: http://stackoverflow.com/questions/2235457/how-to-explain-undefined-behavior-to-know-it-all-newbies – sharptooth May 05 '10 at 10:30
  • 2
    As things are today, @Charles, if you accidentally invoke undefined behavior, you might get a program that works exactly as intended on one system, but which returns subtly wrong answers on another system. If the undefined behavior were defined, then at least you'll know something's wrong immediately, either due to crashing or due to consistently wrong results. – Rob Kennedy May 05 '10 at 13:13
  • 1
    @CharlesBailey: If a program is supposed to convert audiovisual files from one format to another, it may be acceptable if malformed input files generate output files full of "random" pixels and sounds, or if they cause the program to exit, but that doesn't mean it would be acceptable for them to reformat the hard drive. Allowing code to specify what behaviors would and would not be acceptable in case of overflow would allow some easy optimizations far beyond those that would be possible on a program which rigidly inspected all inputs to ensure no overflows could occur. – supercat Jun 23 '15 at 22:26
  • One of the most important benefits of signed `int` overflow being UB is that indexing arrays with an `int` loop counter can still optimize that to a pointer increment or whatever, without extra checks that the loop will definitely terminate without wrapping. See [Is there some meaningful statistical data to justify keeping signed integer arithmetic overflow undefined?](https://stackoverflow.com/q/56047702) and http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html – Peter Cordes Oct 18 '22 at 10:20

11 Answers11

11

I think the heart of the concern comes from the C/C++ philosophy of speed above all.

These languages were created at a time when raw power was sparse and you needed to get all the optimizations you could just to have something usable.

Specifying how to deal with UB would mean detecting it in the first place and then of course specifying the handling proper. However detecting it is against the speed first philosophy of the languages!

Today, do we still need fast programs ? Yes, for those of us working either with very limited resources (embedded systems) or with very harsh constraints (on response time or transactions per second), we do need to squeeze out as much as we can.

I know the motto throw more hardware at the problem. We have an application where I work:

  • expected time for an answer ? Less than 100ms, with DB calls in the midst (say thanks to memcached).
  • number of transactions per second ? 1200 in average, peaks at 1500/1700.

It runs on about 40 monsters: 8 dual core opteron (2800MHz) with 32GB of RAM. It gets difficult to be "faster" with more hardware at this point, so we need optimized code, and a language that allows it (we did restrain to throw assembly code in there).

I must say that I don't care much for UB anyway. If you get to the point that your program invokes UB then it needs fixing whatever the behavior that actually occurred. Of course it would be easier to fix them if it was reported straight away: that's what debug builds are for.

So perhaps that instead of focusing on UB we should learn to use the language:

  • don't use unchecked calls
  • (for experts) don't use unchecked calls
  • (for gurus) are you sure you really need an unchecked call here ?

And everything is suddenly better :)

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • 1
    Okay, but this assumes you can avoid undefined behavior simply by *deciding* to do so. You can get undefined behavior from a typo, and then, technically, you could destroy your system when you compile and run the first time. – Kyle Strand Aug 21 '15 at 05:48
  • @KyleStrand: Certainly detecting UB is difficult. Compilers detect some instances, Static Analyzers too, and then there is run-time instrumentation (Sanitizers) and other inspection tools (ALF, Valgrind) for test runs... This is why I follow Rust these days, as I am afraid than C++ is just never going to manage to avoid UB (it wasn't designed to, and this cannot be retrofitted it seems). – Matthieu M. Aug 21 '15 at 06:16
  • 2
    I'm glad you mentioned Rust. It gives me hope for the future of programming. – Kyle Strand Aug 21 '15 at 06:18
  • C wasn't designed to prioritize speed above everything else, but rather to let programmers exploit low-level constructs, which historically could often accomplish many tasks faster than would have otherwise be possible. When the Standard suggests that implementations may process code "in a documented manner characteristic of the environment" in circumstances where the Standard imposes no requirements, that's not just a theoretical possibility. What made the language useful was historically that implementations would behave in that fashion in cases where doing so would make sense and... – supercat Aug 11 '21 at 21:13
  • ...there was no reason to do otherwise. – supercat Aug 11 '21 at 21:14
10

My take on undefined behavior is this:

The standard defines how the language is to be used, and how the implementation is supposed to react when used in the correct manner. However, it would be a lot of work to cover every possible use of every feature, so the standard just leaves it at that.

However, in a compiler implementation, you can't just "leave it at that," the code has to be turned into machine instructions, and you can't just leave blank spots. In many cases, the compiler can throw an error, but that's not always feasible: There are some instances where it would take extra work to check whether the programmer is doing the wrong thing (for instance: calling a destructor twice -- to detect this, the compiler would have to count how many times certain functions have been called, or add extra state, or something). So if the standard doesn't define it, and the compiler just lets it happen, witty things can sometimes happen, maybe, if you're unlucky.

Carson Myers
  • 37,678
  • 39
  • 126
  • 176
  • 1
    Well, Java for instance provides a reference implementation. This is as clear a definition as it gets. Why is this not done here? If it has to be defined at some point, why not define it as early as possible? – Björn Pollex May 05 '10 at 09:38
  • 5
    I'm not going to pretend to be an expert on all C++ internals, but the fact that C++ programs run closer to the metal than java is probably a huge reason. The language spec has to leave room for implementations on vastly different hardware. – Carson Myers May 05 '10 at 09:43
  • 1
    It's more a matter of philosophy. C++ means speed, and throw caution to the wind. Extra checks don't mesh in this philosophy. – Matthieu M. May 05 '10 at 09:52
  • 1
    @Matthieu: Not precisely. C++ means allowing speed, and having caution as an option. As Stroustrup put it, you can build safety on top of a fast implementation, but you can't build speed on top of a safe implementation, assuming that safety and speed conflict. If you want fast access to elements of a vector, you use `[]`. If you want checked access, you use `.at()`. They're both in the Standard. – David Thornley Jun 10 '10 at 17:22
  • 3
    I don't argue with the idea of offering both, I argue with the fact that most developers don't need speed but use the idiomatic way of accessing an index `[]`, which is also unsafe... And thus I would prefer having a safe idiomatic way, and an unsafe other way for those who really need speed. – Matthieu M. Jun 11 '10 at 06:24
  • "As Stroustrup put it, you can build safety on top of a fast implementation, but you can't build speed on top of a safe implementation, assuming that safety and speed conflict.": someone also said that premature optimization is the root of all evil. One could write correct software first, and then optimize bottlenecks later (e.g. calling an external routine written in C or assembly, if needed). IMO thinking about efficiency first and about safety later is not the best approach: in my experience it is easier to make a correct program faster than it is to make a fast program correct. – Giorgio Feb 24 '13 at 13:56
  • @Giorgio: Agreed 100%. The way C handles pointers and arrays may have helped low-level optimizations in the 1970s through the 1990s, but acts as a major impediment to higher-level optimizations which would be useful today. – supercat Jun 23 '15 at 22:46
6

The problems are not caused by undefined behaviour, they are caused by writing the code that leads to it. The answer is simple - don't write that kind of code - not doing so is not exactly rocket science.

As for:

an example of undefined behavior that cannot be detected at runtime or compile time without considerable overhead

A real world issue:

int * p = new int;
// call loads of stuff which may create an alias to p called q
delete p;

// call more stuff, somewhere in which you do:
delete q;

Detecting this at compile time is imposisible. at run-time it is merely extremely difficult and would require the memory allocation system to do far more book-keeping (i.e. be slower and take up more memory) than is the case ifwe simply say the second delete is undefined. If you don't like this, perhaps C++ is not the language for you - why not switch to java?

  • 3
    That is true, however, if a "feature" leads to so many wasted hours of work, then you've got to wonder why it is not just removed. There has to be a pretty impressive upside to justify all that. – Björn Pollex May 05 '10 at 09:23
  • undefined behavior is not a "feature" – Carson Myers May 05 '10 at 09:26
  • Why then, what exactly is it? – Björn Pollex May 05 '10 at 09:33
  • 4
    @Space_C0wb0y it's behavior that has not been defined because it was inconvenient for the committee, it would harm the portability of the language or make the compilers impossibly hard to write. I mean... it's not a feature, it's a lack of certain features that weren't needed or were implausible. – Carson Myers May 05 '10 at 09:38
  • So it comes down to politics? That is disappointing but expected. – Björn Pollex May 05 '10 at 09:40
  • 2
    Your answer is from the programmer perspective, while the question is posed from a language design perspective. Q:"Why does this bad thing exist?" A:"Avoid it" – Ross May 05 '10 at 09:41
  • 1
    I don't really see it as politics, as far as the committee thing goes. I mean there's only so much time to write the language spec, and it already takes a really long time. Defining all the things you're not supposed to do anyway seems like a silly waste of time. – Carson Myers May 05 '10 at 09:47
  • If one where to raise a statistic about the amount of money wasted due to bugs that can be traced back to undefined behavior, I am certain you pay a lot of smart people to write very detailed specs from that. – Björn Pollex May 05 '10 at 09:51
  • 12
    No, it is not politics, but rather engineering. Not everything can be checked within reasonable terms. Say that dereferencing an invalid pointer is changed from undefined behavior to a known error. Then the standard would be **requiring** all implementations to perform checks around each and every pointer dereference to produce that error. And I am not just talking about dereferencing null, but **all** pointers. Whenever you see `*p`, you would have to verify that `p` is a pointer to a valid block of memory, requiring the runtime to track all allocated memory for that check. – David Rodríguez - dribeas May 05 '10 at 09:52
  • I second David. Think of all the money which is saved by not doing those extra checks. Think of all the things that are possible by not doing those extra checks. Millions are spent on programmer errors... but a NullPointerException is an error too, and money is lost if your program was not carved out for an exception-throwing environment. So in the end, UB does not affect the money lost, careless programming does whatever the language.... and it's just hard to program correctly :) – Matthieu M. May 05 '10 at 09:57
  • 5
    On the amount of money that is lost tracking bugs: Undefined Behavior from the standard point of view does not mean that it has to be undefined in you particular implementation. Many implementations have specific code in debug builds to diagnose errors. Different implementations will offer greater diagnostics support to try and grab a bigger piece of the market. Not having it standardized means that the same implementation can do bounds checking on iterators in debug mode and at the same time have a fast unchecked release version. – David Rodríguez - dribeas May 05 '10 at 09:58
  • "make the compilers impossibly hard to write.": But then it is harder to write correct applications. I would rather spend some time to write a more stricter compiler and than save much more time when writing application software using that compiler: it is better to do the effort only once. – Giorgio Feb 24 '13 at 13:51
5

The main source of undefined behaviour are pointers, and that's why C and C++ have a lot of undefined behaviour.

Consider this code:

char * r = 0x012345ff;
std::cout << r;

This code looks very bad, but should it issue an error? What if that address is indeed readable i.e. it's a value I obtained somehow (maybe a device address, etc.)?

In cases like this, there's no way to know if the operation is legal or not, and if it isn't, it's behaviour is indeed unpredictable.

Apart from this: in general C++ was designed with "The zero overhead rule" in mind (see The Design and Evolution of C++), so it couldn't possibly impose any burden on implementations to check for corner cases etc. You should always keep in mind that this language was designed and is indeed used not only on the desktop but also in embedded systems with limited resources.

UncleZeiv
  • 18,272
  • 7
  • 49
  • 77
4

Many things that are defined as undefined behavior would be hard if not impossible to diagnose by the compiler or runtime environment.

The ones that are easy have already turned into defined-undefined behavior. Consider calling a pure virtual method: it is undefined behavior, but most compilers/runtime environments will provide an error in the same terms: pure virtual method called. The defacto standard is that calling a pure virtual method call is a runtime error in all environments I know of.

David Rodríguez - dribeas
  • 204,818
  • 23
  • 294
  • 489
  • And why does nobody make the defacto standard a real standard? – Björn Pollex May 05 '10 at 09:36
  • What difference would that make? That is, the implementation is standard compliant (well, anything is compliant with UB), and users get the information that they need. What need is there to modify the standard? If it ain't broken don't fix it --you might just break it in a different way. – David Rodríguez - dribeas May 05 '10 at 09:46
  • 2
    Kepe in mind that if you want to "standardize" it, you're rapidly cerating more problems than you solve. Must the message go to std::cout or std::cerr ? What if they're redirected? What if the pure virtual call happens inside the redirecting `streambuf` ? Must the message be in English or can it be localized? And most damning: my applications users won't understand it anyway. – MSalters May 06 '10 at 09:31
  • @MSalters: the pure-virtual call is in fact easy to define to call `std::terminate()`, which is better than undefined behavior (it's better to reboot the mars rover than let it go in an unpredictable way and get stuck in a rock). – Yakov Galka Sep 03 '11 at 13:51
  • @ybungalobill: *undefined behavior* does not mean that it has to be *unpredictable*. As a matter of fact all compilers that I know provide a sensible diagnostic message when you call a pure virtual method. By *mandating* a call to `std::terminate` you are not really helping as the terminate handler can be set by the user and it does not have any means of knowing *what* caused the system to call `terminate`. – David Rodríguez - dribeas Sep 03 '11 at 16:17
  • @David: undefined behavior *is* unpredictable based on the scope of the document that leaves it undefined. It means that you cannot reason ([prove](http://en.wikipedia.org/wiki/Mathematical_proof)) about the state of the machine after it invokes undefined behavior. Of course the implementation (or another derived standard, e.g. POSIX) might change some behavior from undefined to defined, in which case you get perfectly predictable results on that implementation. – Yakov Galka Sep 03 '11 at 17:23
  • @David: 'knowing what caused the system to call terminate' is useful for debugging only, but the former is useful for saving a few million dollars, saving human lives etc... – Yakov Galka Sep 03 '11 at 17:24
  • @ybungalobill: *Undefined behavior* means that the behavior of the system is undefined by the standard, and that does not *necesarily* mean that the implementation cannot do better. Calling a pure virtual method is *undefined behavior* and yet, all the implementations I have used provide the guarantee of a clean crash with a sensible message. That is what is called *quality of implementation*, and it means that in my compiler (fill in the name: gcc, clang, vs, intel) the compiler *guarantees* something that the standard does not define. – David Rodríguez - dribeas Sep 03 '11 at 21:56
  • I think you misunderstood the comment I made, so I will try to rephrase it from the opposite point of view: If the standard *mandated* calling `terminate` in that situation, none of those implementations would be allowed to do anything else but calling `terminate`, and in that case it would be much harder to *reason*. The sentence about *saving a few million dollars, saving human lives...* is pure **demagogy**. Next time you are on a plane think how glad you would be if the avionics just crashed and restarted in the middle of the flight. When systems are critical the only solution is testing. – David Rodríguez - dribeas Sep 03 '11 at 22:02
  • @David: I don't exactly understand what we are arguing about... I think that UB is good. I just said that the case of pure-virtual call (unlike dozens of other aspects of C++) is simple to be defined, without any performance penalty. And it's not demagogy. My work is in fact related to avionics. And failures happen, both hardware and software. Testing is probabilistic, proving is not. Restarting the system (which is guaranteed to complete in say, 10 seconds) would leave you in the land of determinacy, which may be your best bet. – Yakov Galka Sep 03 '11 at 22:24
  • @ybungalobill: Is calling a pure virtual function *undefined* in your implementation? Go to the docs and check. You might find out that it is *fully defined*. I don't really know what we are discussing either, but my point is that if you *define* a behavior in the standard, it better be good enough to help you debugging the error, and that is not *calling `terminate`*. The problem of *defining* a particular behavior there is that you don't give freedom to provide a *different better* behavior. As you already said, *undefined* only means *undefined* when no other guarantee is provided. – David Rodríguez - dribeas Sep 03 '11 at 22:40
3

The standard leaves "certain" behaviour undefined in order to allow a variety of implementations, without burdening those implementations with the overhead of detecting "certain" situations, or burdening the programmer with constraints required to prevent those situations arising in the first place.

There was a time when avoiding this overhead was a major advantage of C and C++ for a huge range of projects.

Computers are now several thousand times faster than they were when C was invented, and the overheads of things like checking array bounds all the time, or having a few megabytes of code to implement a sandboxed runtime, don't seem like a big deal for most projects. Furthermore, the cost of (e.g.) overrunning a buffer has increased by several factors, now that our programs handle many megabytes of potentially-malicious data per second.

It is therefore somewhat frustrating that there is no language which has all of C++'s useful features, and which in addition has the property that the behaviour of every program which compiles is defined (subject to implementation-specific behaviour). But only somewhat - it's not actually all that difficult in Java to write code whose behaviour is so confusing that from the POV of debugging, if not security, it might as well be undefined. It's also not at all difficult to write insecure Java code - it's just that the insecurity usually is limited to leaking sensitive information or granting incorrect privileges over the app, rather than giving up complete control of the OS process the JVM is running in.

So the way I see it is that good software engineering requires discipline in all languages, the difference is what happens when our discipline fails, and how much we're charged by other languages (in performance and footprint and C++ features you like) for insurance against that. If the insurance provided by some other language is worth it for your project, take it. If the features provided by C++ are worth paying for with the risk of undefined behaviour, take C++. I don't think there's much mileage in trying to argue, as if it was a global property that's the same for everyone, whether the benefits of C++ "justify" the costs. They're justified within the terms of reference for the design of the C++ language, which are that you don't pay for what you don't use. Hence, correct programs should not be made slower in order that incorrect programs get a useful error message instead of UB, and most of the time behaviour in unusual cases (e.g. << 32 of a 32-bit value) should not be defined (e.g. to result in 0) if that would require the unusual case to be checked for explicitly on hardware which the committee wants to support C++ "efficiently".

Look at another example: I don't think the performance benefits of Intel's professional C and C++ compiler justify the cost of buying it. Hence, I haven't bought it. Doesn't mean others will make the same calculation I made, or that I will always make the same calculation in future.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
2

Compilers and programming languages are one of my favorite topics. In the past I did some research related with compilers and I have found many many times undefined behavior.

C++ and Java are very popular. It does not mean that they have a great design. They are widely used because they took risks in detriment of their design quality just to gain acceptance. Java went for garbage collection, virtual machine and pointer-free appearance. They were the partly pioneers and could not learn from many previous projects.

In the case of C++ one of the main goals was to give object oriented programming to C users. Even C programs should compile with a C++ compiler. That made a lot of nasty open points and C had already many ambiguities. C++ emphasis was power and popularity, not integrity. Not many languages give you multiple-inheritance, C++ give you that although not in a very polished way. Undefined behavior will always be there to support its glory and backwards compatibility.

If you really want a robust and well defined language you must look somewhere else. Sadly that is not the main concern of most people. Ada for example is a great language where a clear and defined behavior is important, but hardly anyone cares about the language because of its narrow user base. I am biased with the example because I really like that language, I posted something on my blog but if you want to learn more about how a language definition can help to to have less bugs even before you compile have a look at these slides

I am not saying C++ is a bad language! It just have different goals and I love working with it. You also have a large community, great tools, and much more great stuff such as STL, Boost and QT. But your doubt is also the root to become a great C++ programmer. If you want to be great with C++ this should be one of your concerns. I would encourage you to read the previous slides and also this critic. It will help you a lot to understand those times when the language is not doing what you expect.

And by the way. Undefined behavior goes totally against portability. In Ada for example, you have control about the layout of data structures (in C and C++ it can change according machine and compiler). Threads are part of the language. So porting C and C++ software will give you more pain than pleasure

SystematicFrank
  • 16,555
  • 7
  • 56
  • 102
  • 2
    Strange then that vast amounts of highly portable software has been written in C and C++ - more than in any other languages, I would estimate. –  May 05 '10 at 10:45
  • 1
    @Neil. With C/C++ if you switch compilers you might alter the layout of your structures (worst witching microprocessors). If you use threads in Linux you will have to use a different library when going for Windows. Ada does not have those problems (nor many others) but no one use it because its difficulty and lack of popularity. That you can easily shoot your foot (C) or blow it away (C++) is what people love because both give immediate low level access to all the power of their CPUs. Portability could be easier with other languages, but even I would choose C/C++ just because their popularity – SystematicFrank May 05 '10 at 16:24
  • 2
    At the time of this comment I have noticed that after clicking the "undefined behavior" tag, all the questions in Stack Overflow (except one) are related with either the C or C++ tags – SystematicFrank May 05 '10 at 16:30
  • Because undefined behaviour is an integral part of the C and the C++ standards: 1.3.13. [defns.undefined] `behavior, such as might arise upon use of an erroneous program construct or erroneous data, for which this International Standard imposes no requirements. [...] [ Note: permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment[...] —end note ]` – Sebastian Mach Jun 21 '11 at 10:50
  • 1
    @phresnel: Unfortunately, hyper-modern philosophy deprecates the "...behaving...in a documented manner characteristic from the environment" suggestion. Given `unsigned char ch = getchar(); if (ch < 216) printf("Hey"); ch*=(ch*ch*ch*ch);`, hyper-modern philosophy suggests that the value of being able to infer from the last statement that `ch` will be less than 216 (and thus the printf should always execute) exceeds the value of having the code perform mod-256 arithmetic for all values of ch. – supercat Jun 23 '15 at 22:36
  • A common source of UB in many programs comes from situations where the Standard defines the behavior of a category of actions, but elsewhere says that an overlapping category of actions invoke UB. In some cases where both statements apply, it's sufficiently obvious that the defined behavior should be given priority that nobody would seriously suggest that any sane implementation should do otherwise. Unfortunately, there's no general way of knowing which such cases are sufficiently "obvious" that compiler writers will treat them as defined. – supercat Aug 06 '18 at 19:20
2

It's important to be clear on the differences between undefined behavior and implementation-defined behavior. Implementation defined behavior gives compiler writers the opportunities to add extensions to the language in order to leverage their platform. Such extensions are necessary in order to write code that works in the real world.

UB on the other hand exists in cases where it is difficult or impossible to engineer a solution without imposing major changes in the language or big differences from C. One example taken from a page where BS talks about this is:

int a[10];
a[100] = 0; // range error
int* p = a;
// ...
p[100] = 0; // range error (unless we gave p a better value before that assignment)

The range error is UB. It is an error, but how precisely the platform should deal with this is undefined by the Standard because the Standard can't define it. Each platform is different. It can't be engineered to an error because this would necessitate including automatic range checking in the language, which would require a major change to the language's feature set. The p[100] = 0 error is even more difficult for the language to generate a diagnostic for, either at compile- or run-time, because the compiler can't know what p really points to without run-time support.

John Dibling
  • 99,718
  • 31
  • 186
  • 324
  • If a structure ends with a single-element array `arr`, could a standards-conforming compiler replace any references of the form `arr[i]` with `arr[0]`? I don't know that single-element-arrays are common enough to be especially worth exploiting, but but certainly the code for `arr[0]` could be faster and more compact than code for `arr[i]`. My guess would be that `arr[i]` would be undefined behavior for i!=0, whether or not storage is allocated beyond the structure, but that even if the struct hack is illegitimate, it's common enough that compilers should accommodate it. – supercat Nov 20 '11 at 19:49
1

I asked myself that same question a few years ago. I stopped considering it right away, when I tried to provide a proper definition for the behavior of a function that writes to a null pointer.

Not all devices have a concept of protected memory. So you can't possibly rely on the system to protect you via a segfault or similar. Not all devices have read only memory, so you can't possibly say that the write simply does nothing. The only other option I could think of is to require that the application raise an exception [or abort, or something] without help from the system. But in that case, the compiler has to insert code before every single memory write to check for null unless it can guarantee that the pointer has not changed since the list memory write. That is clearly unacceptable.

So, leaving the behavior undefined was the only logical decision I could come to, without saying "Compliant C++ compilers can only be implemented on platforms with protected memory."

Dennis Zickefoose
  • 10,791
  • 3
  • 29
  • 38
  • "That is clearly unacceptable". I don't think it's all that clear. I've worked with a Java JIT that did exactly this on certain CPUs. Performance was fine. C and C++ programmers are pretty much defined as people to whom this is unacceptable, but there's no particular reason that they (we - I include myself for some projects) should be (a) numerous, or (b) always correct to rule it out ;-) – Steve Jessop May 05 '10 at 11:01
  • @Steve: I agree, most of the times it's perfectly acceptable to check, the only issue is that in a number of tight loops we need unchecked versions to make things run faster (or be doomed). Unfortunately programmers think in term of binary and often use the same idiom everywhere, thus writing the same unchecked calls routines even when speed isn't required :'( – Matthieu M. May 05 '10 at 11:24
  • The Java approach to this is to let the JIT hoist bounds checks outside the loop where possible. You're right, there are some cases where the programmer knows the bounds will not be exceeded, but the proof is too hard for the compiler/JIT to produce, *and* the performance cost would be significant. So, OK, if there did not exist a language in which the checks could be omitted, that would be unacceptable and someone would invent one. And quickly use it to omit bounds checks in cases where their proof that the bounds will not be exceeded is unavailable to the compiler because it's *wrong* ;-) – Steve Jessop May 05 '10 at 11:34
  • You don't need to check a pointer for `null` before every access to it. You need to check it only after possible modification. – anton_rh Feb 27 '18 at 06:02
1

Here's my favourite: after you've done delete on a non-null pointer using it (not only dereferencing, but also castin, etc) is UB (see this question).

How you can run into UB:

{
    char* pointer = new char[10];
    delete[] pointer;
    // some other code
    printf( "deleted %x\n", pointer );
}

Now on all architectures I know the code above will run fine. Teaching the compiler or runtime to perform analysis of such situations is very hard and expensive. Don't forget that sometimes it might be millions lines of code between delete and using the pointer. Settings pointers to null immediately after delete can be costly, so it's not a universal solution as well.

That's why there's the concept of UB. You don't want UB in your code. Maybe works maybe not. Works on this implementation, breaks on another.

Community
  • 1
  • 1
sharptooth
  • 167,383
  • 100
  • 513
  • 979
  • This involves a problem with the C\C++ 'delete'. Therefore it is solved by a single word beginning with J. – alan2here Mar 14 '11 at 17:43
  • 1
    @alan2here: Where exactly is a *problem with delete*? – sharptooth Mar 15 '11 at 06:12
  • 1
    If there was no problem there would be no languages with automatic cleanup, like Java, ultimately though it's a matter of opinion. Keeping track of what needs deleting and cleaning it up properly in C++ can be far from trivial, depending on how you code this might never be a problem for you, or it might turn simple programs into highly complex, buggy ones. In Java if you can refer to something, like as you are using 'pointer' in your example, then it exists. – alan2here Mar 16 '11 at 15:50
  • So in Java the only way presuming 'pointer' refers to something sensible anyway is if 'pointer' went out of scope before the printf, whereas in C++ deleting what the pointer refers to also causes a problem. In Java what the pointer refers to is deleted, perhaps internally with something similar to a delete, but only when nothing can refer to it. – alan2here Mar 16 '11 at 16:00
  • 3
    @alan2here: This is not an issue of trying to access a deallocated object - just printing the address is already UB. And this is not a problem with `delete` - it's a problem with the developer. – sharptooth Mar 17 '11 at 06:05
  • 1
    ahh, I can see now that your not trying to access the object on the end of the pointer, just print the address, this should just print the address shouldn't it? I'm suprised that this produces UB. Is it something to do with the array, maybe it wouldn't be a problem with std::vectors and cout instead. You still wouldn't have to blame the developer in Java, you wouldn't get this scinario occur. But it dosn't look like you would get UB in C++, must just be a C++ oddity. Iv'e heard that it dosn't change much, but I hope the new C++ standard clears stuff like this up a little. – alan2here Mar 20 '11 at 23:08
  • 2
    @alan2here: Here's the explanation of what can actually go wrong: http://stackoverflow.com/questions/1866461/why-should-i-not-try-to-use-this-value-after-delete-this/1866543#1866543 – sharptooth Mar 21 '11 at 06:24
0

There are times when undefined behavior is good. Take a big int for example.

union BitInt
{
    __int64 Whole;
    struct
    {
        int Upper;
        int Lower; // or maybe it's lower upper. Depends on architecture
    } Parts;
};

The spec says if we last read or wrote to Whole then reading/writing from Parts is undefined.

Now, that's just a tad silly to me because if we couldn't touch any other parts of the union then there is no point in having the union in the first place, right?

But anyway, maybe some functions will take __int64 while other functions take the two separated ints. Rather than convert every time we can just use this union. Every compiler I know treats this undefined behavior in a pretty clear way. So in my opinion undefined behavior isn't so bad here.

ProgramMax
  • 97
  • 3
  • 1
    As you note in the comment yourself, this behaviour is architecture dependant regarding endianness (and also regarding padding of fields, and size of fields). So it works on some platforms and does not work on others. If you stay on a platform where it works, fine for you. But then you are leaning on that specific platform architecture and compiler implementation. – Péter Török May 05 '10 at 09:41
  • 1
    Yeah, this is true. But you get what I'm saying about "Then why even have union if the only benefit it provides is undefined behavior anyway?" What I'm trying to point out is the union keyword exists because not all undefined behavior is bad. – ProgramMax May 05 '10 at 10:00
  • 6
    Unions compress multiple types into a single block of memory. This usage is quite useful, and does not require any dependence on undefined behavior. – Dennis Zickefoose May 05 '10 at 10:08
  • Hrmm. The C standard section 6.5 / 7 says that once you write to one part of a union access to all others is undefined. But looking at the C++ spec section 9.5 about unions doesn't say anything about that. – ProgramMax May 05 '10 at 10:54
  • 1
    @ProgramMax: 9.5 does actually speak to this, only in riddles you must decipher: 9.5.1: "In a union, at most one of the data members can be active at any time, that is, the value of at most one of the data members can be stored in a union at any time." – John Dibling May 05 '10 at 14:09
  • Ah, thanks John. :D I jumped to 9.5 "Unions" and then skimmed for "undefined" and didn't find it. But you are right, it is there...just in hints. :D – ProgramMax May 05 '10 at 18:52
  • 1
    There's tons of undefined behavior in C++, and most of is undefined by omission. – MSalters May 06 '10 at 09:32
  • @ProgramMax: e.g., allowed use of a union would be to store leaf nodes and inner nodes in some tree-node type, where leaf-nodes don't use inner-node information, and vice versa. However, you can easily make your bigint thingy portable by replacing the union with almost equally fast bit-operations. – Sebastian Mach Jun 21 '11 at 10:54
  • Even the standard specifies that some particular situation (say, subtracting one from INT_MIN) will invoke Undefined Behavior, an implementation may perfectly legitimately define precisely what will happen in that situation (e.g. the result of the subtraction will be MAX_INT). When using such a compiler, the situation in question would not produce Undefined Behavior, since the documentation would define it. I would expect that the vast majority of programs where `union` is used as illustrated are in fact run on systems whose documentation specifies that it will work as expected. – supercat Jul 02 '12 at 22:37