52

Most of the conversations around undefined behavior (UB) talk about how there are some platforms that can do this, or some compilers do that.

What if you are only interested in one platform and only one compiler (same version) and you know you will be using them for years?

Nothing is changing but the code, and the UB is not implementation-defined.

Once the UB has manifested for that architecture and that compiler and you have tested, can't you assume that from then on whatever the compiler did with the UB the first time, it will do that every time?

Note: I know undefined behavior is very, very bad, but when I pointed out UB in code written by somebody in this situation, they asked this, and I didn't have anything better to say than, if you ever have to upgrade or port, all the UB will be very expensive to fix.

It seems there are different categories of Behavior:

  1. Defined - This is behavior documented to work by the standards
  2. Supported - This is behavior documented to be supported a.k.a implementation defined
  3. Extensions - This is a documented addition, support for low level bit operations like popcount, branch hints, fall into this category
  4. Constant - While not documented, these are behaviors that will likely be consistent on a given platform things like endianness, sizeof int while not portable are likely to not change
  5. Reasonable - generally safe and usually legacy, casting from unsigned to signed, using the low bit of a pointer as temp space
  6. Dangerous - reading uninitialized or unallocated memory, returning a temp variable, using memcopy on a non pod class

It would seem that Constant might be invariant within a patch version on one platform. The line between Reasonable and Dangerous seems to be moving more and more behavior towards Dangerous as compilers become more aggressive in their optimizations

Glenn Teitelbaum
  • 10,108
  • 3
  • 36
  • 80
  • 2
    'defined' UB can be changed between even patch versions of compiler, or even by system update (without change to original code). Even one platform does not make you safe – Hcorg Aug 28 '15 at 13:50
  • i think that undefined behavior can also be linked to variable value (i.e. null pointer) and that not something you can always predict. – Zohar81 Aug 28 '15 at 13:51
  • 9
    The only reason this might not be a question which "attracts opinion-based answers" is that probably most people agree that UB is bad;) I think if you know about UB in your program and decidedly don't fix it, it can very easily bite you in the ass. You can't honestly assume that you will use the **exact same compiler** for the lifetime of your code. And once the compiler changes, anything can happen. This might just be a case of rationalization from the person who was caught red-handed with undefined behaviour. Just go and fix that bug:) – Andras Deak -- Слава Україні Aug 28 '15 at 13:51
  • 2
    Even with the same compiler you never know if one day your UB will kick your ass, just because a certain bit of memory just has the right value to blow every thing up. IMHO the worst kind of UB is the one that looks fine and you think you have it under control until one day it teaches you the meaning of undefined. – 463035818_is_not_an_ai Aug 28 '15 at 14:06
  • Is there a way to ask the same thing and not "attract opinion-based answers", as per the comment by @AndrasDeak in theory I thought the question was not opinion. I also want to avoid invalidating answers if i do change it – Glenn Teitelbaum Aug 28 '15 at 14:34
  • @GlennTeitelbaum, no, my point was that this question seems too philosophical to me altogether, and maybe more suited to some other SE sites (such as [programmers.SE](http://programmers.stackexchange.com/)). – Andras Deak -- Слава Україні Aug 28 '15 at 14:39
  • 1
    "What if you are only interested in one platform and only one compiler (same version) and you know you will be using them for years?" - If you can predict the future with that degree of certainty, you are wasting your time developing software. You can make much more money for much less effort ;-) Seriously, the only guaranteed rule in software development is "stuff happens, and usually it happens at the most inconvenient time possible" – alephzero Aug 28 '15 at 22:54
  • 1
    Do you also intend never to change your source code? If so, then you need only compileonce, and need only check the generated machine code byte by byte that nothing unexpected happened. On th eother hand if you intend to modify your source, then relatively unrelated changes may influence how UB-infected code is optimized and hence result in different interpretations of the UB code. – Hagen von Eitzen Aug 29 '15 at 20:09
  • Common Lisp has a (sometimes nice, sometimes frustrating) notion called "implementation-dependent". E.g., there's a constant **ARRAY-DIMENSION-LIMIT** that every implementation must define, and it must not be less than 1024, but the specific value is not defined by the standard. I think that depending on that kind of behavior is not as bad, if you're prepared to commit to particular implementation, whereas completely undefined behavior is trickier. – Joshua Taylor Aug 29 '15 at 21:22
  • The concept of wobbly values, which I mention in my [answer here](http://stackoverflow.com/a/31746063/1708801) show one case where your conditions are not sufficient to ensure predictable behavior. I don't know of an implementation that does this but it shows how problematic assumptions around undefined behavior can be. Basically as I say, don't use undefined behavior, there is almost always an alternative. – Shafik Yaghmour Sep 01 '15 at 16:38
  • "Once the UB has manifested for that architecture and that compiler and you have tested, can't you assume that from then on whatever the compiler did with the UB the first time, it will do that every time?" *No*. Even the idea that "the UB has manifested" is wrong. "Program testing can be used to show the presence of bugs, but never to show their absence!" - Edsger Dijkstra – philipxy Sep 02 '15 at 00:02

12 Answers12

56

OS changes, innocuous system changes (different hardware version!), or compiler changes can all cause previously "working" UB to not work.

But it is worse than that.

Sometimes a change to an unrelated compilation unit, or far away code in the same compilation unit, can cause previously "working" UB to not work; as an example, two inline functions or methods with different definitions but the same signature. One is silently discarded during linking; and completely innocuous code changes can change which one is discarded.

The code that is working in one context can suddenly stop working in the same compiler, OS and hardware when you use it in a different context. An example of this is violating strong aliasing; the compiled code might work when called at spot A, but when inlined (possibly at link-time!) the code can change meaning.

Your code, if part of a larger project, could conditionally call some 3rd party code (say, a shell extension that previews an image type in a file open dialog) that changes the state of some flags (floating point precision, locale, integer overflow flags, division by zero behavior, etc). Your code, which worked fine before, now exhibits completely different behavior.

Next, many kinds of undefined behavior are inherently non-deterministic. Accessing the contents of a pointer after it is freed (even writing to it) might be safe 99/100, but 1/100 the page was swapped out, or something else was written there before you got to it. Now you have memory corruption. It passes all your tests, but you lacked complete knowledge of what can go wrong.

By using undefined behavior, you commit yourself to a complete understanding of the C++ standard, everything your compiler can do in that situation, and every way the runtime environment can react. You have to audit the produced assembly, not the C++ source, possibly for the entire program, every time you build it! You also commit everyone who reads that code, or who modifies that code, to that level of knowledge.

It is sometimes still worth it.

Fastest Possible Delegates uses UB and knowledge about calling conventions to be a really fast non-owning std::function-like type.

Impossibly Fast Delegates competes. It is faster in some situations, slower in others, and is compliant with the C++ standard.

Using the UB might be worth it, for the performance boost. It is rare that you gain something other than performance (speed or memory usage) from such UB hackery.

Another example I've seen is when we had to register a callback with a poor C API that just took a function pointer. We'd create a function (compiled without optimization), copy it to another page, modify a pointer constant within that function, then mark that page as executable, allowing us to secretly pass a pointer along with the function pointer to the callback.

An alternative implementation would be to have some fixed size set of functions (10? 100? 1000? 1 million?) all of which look up a std::function in a global array and invoke it. This would put a limit on how many such callbacks we install at any one time, but practically was sufficient.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • +1 but I'm still unclear how "code might work when called at spot A, but when inlined (possibly at link-time!) the code can change meaning." and I'm frankly interested in "It is sometimes still worth it." – Glenn Teitelbaum Aug 28 '15 at 14:50
  • 3
    @GlennTeitelbaum it is undefined behavior: it can do anything, and things you don't think matter can make it do different things. Strict aliasing means that a pointer to A and a pointer to B can be presumed not to point to the same thing. You modify one via the other in a way that is UB. It doesn't matter in the context of a small function; when inlined into a larger context, the UB occurs, the compiler is *free to detect that UB*, and determine certain code branches if they occur *do not matter*, and ignore those possibilities. – Yakk - Adam Nevraumont Aug 28 '15 at 15:11
  • @Glenn example of "UB that might be worth it" added. – Yakk - Adam Nevraumont Aug 28 '15 at 15:16
  • 11
    Or in short, when you use Undefined Behavior to produce a binary, it's no longer the source code you need to audit/validate, it's the binary. I am not sure there are many static analyzers for assembly... – Matthieu M. Aug 28 '15 at 17:30
  • 1
    @MatthieuM.: I've written static analyzers for binary code, though they tended to be rather focused on tracking down particular compiler bugs (e.g. identifying every instruction that appeared to be reachable with a certain mode bit set and also with that same mode bit clear, to identify a compiler bug that caused two branches of an "if" to have their tails erroneously merged even though they were reached with the mode flag in different states). – supercat Aug 28 '15 at 19:30
  • @matt "audit the assembly" stolen – Yakk - Adam Nevraumont Aug 29 '15 at 12:21
  • @Yakk: you're welcome to it, I really liked your answer to start with, so I commented here to add to it rather than write another answer myself in the first place :) – Matthieu M. Aug 29 '15 at 12:40
  • +1 for acknowledging that it is still sometimes worth it, and for the "fastest possible delegates" link, which includes this extremely telling quote: – Kyle Strand Sep 01 '15 at 16:32
  • 1
    "It is absurd that the C++ Standard allows you to cast between member function pointers, but doesn't allow you to invoke them once you've done it.... the cast won't always work on many popular compilers (so, casting is standard, but not portable), [and] on all compilers, if the cast is successful, invoking the cast member function pointer behaves exactly as you would hope: there is no need for it to be classed as "undefined behavior". (Invocation is portable, but not standard!) " – Kyle Strand Sep 01 '15 at 16:36
20

No, that's not safe. First of all, you will have to fix everything, not only the compiler version. I do not have particular examples, but I guess that a different (upgraded) OS, or even an upgraded processor might change UB results.

Moreover, even having a different data input to your program can change UB behavior. For example, an out-of-bound array access (at least without optimizations) usually depend on whatever is in the memory after the array. UPD: see a great answer by Yakk for more discussion on this.

And a bigger problem is optimization and other compiler flags. UB may manifest itself in a different ways depending on optimization flags, and it's quite difficult to imagine somebody to use always the same optimization flags (at least you'll use different flags for debug and release).

UPD: just noticed that you did never mention fixing a compiler version, you only mentioned fixing a compiler itself. Then everything is even more unsafe: new compiler versions might definitely change UB behavior. From this series of blog posts:

The important and scary thing to realize is that just about any optimization based on undefined behavior can start being triggered on buggy code at any time in the future. Inlining, loop unrolling, memory promotion and other optimizations will keep getting better, and a significant part of their reason for existing is to expose secondary optimizations like the ones above.

Petr
  • 9,812
  • 1
  • 28
  • 52
  • 7
    I found this to be an eye-opener what the compiler does with undefined behavior: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html. – Jens Aug 28 '15 at 13:54
  • 2
    Some UB is undefined between compilations, but other UB are undefined between different runs of the same code. So you'll better avoid it in every case. – petersohn Aug 28 '15 at 14:27
  • The question is if nothing but the code changes, changing compilers was vague, so I clarified (same version). I considered patching compiler 1.5.2.7 to 1.5.2.8 changing the compiler, but I made that hopefully clearer – Glenn Teitelbaum Aug 28 '15 at 16:16
9

This is basically a question about a specific C++ implementation. "Can I assume that a specific behavior, undefined by the standard, will continue to be handled by ($CXX) on platform XYZ in the same way under circumstances UVW?"

I think you either should clarify by saying exactly what compiler and platform you are working with, and then consult their documentation to see if they make any guarantees, otherwise the question is fundamentally unanswerable.

The whole point of undefined behavior is that the C++ standard doesn't specify what happens, so if you are looking for some kind of guarantee from the standard that it's "ok" you aren't going to find it. If you are asking whether the "community at large" considers it safe, that's primarily opinion based.

Once the UB has manifested for that architecture and that compiler and you have tested, can't you assume that from then on whatever the compiler did with the UB the first time, it will do that every time?

Only if the compiler makers guarantee that you can do this, otherwise, no, it's wishful thinking.


Let me try to answer again in a slightly different way.

As we all know, in normal software engineering, and engineering at large, programmers / engineers are taught to do things according to a standard, the compiler writers / parts manufacturers produce parts / tools that meet a standard, and at the end you produce something where "under the assumptions of the standards, my engineering work shows that this product will work", and then you test it and ship it.

Suppose you had a crazy uncle jimbo and one day, he got all his tools out and a whole bunch of two by fours, and worked for weeks and made a makeshift roller coaster in your backyard. And then you run it, and sure enough it doesn't crash. And you even run it ten times, and it doesn't crash. Now jimbo is not an engineer, so this is not made according to standards. But if it didn't crash after even ten times, that means it's safe and you can start charging admission to the public, right?

To a large extent what's safe and what isn't is a sociological question. But if you want to just make it a simple question of "when can I reasonably assume that no one would get hurt by me charging admission, when I can't really assume anything about the product", this is how I would do it. Suppose I estimate that, if I start charging admission to the public, I'll run it for X years, and in that time, maybe 100,000 people will ride it. If it's basically a biased coin flip whether it breaks or not, then what I would want to see is something like, "this device has been run a million times with crash dummies, and it never crashed or showed hints of breaking." Then I could quite reasonably believe that if I start charging admission to the public, the odds that anyone will ever get hurt are quite low, even though there are no rigorous engineering standards involved. That would just be based on a general knowledge of statistics and mechanics.

In relation to your question, I would say, if you are shipping code with undefined behavior, which no one, either the standard, the compiler maker, or anyone else will support, that's basically "crazy uncle jimbo" engineering, and it's only "okay" if you do vastly increased amounts of testing to verify that it meets your needs, based on a general knowledge of statistics and computers.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Chris Beck
  • 15,614
  • 4
  • 51
  • 87
  • 1
    What's worse is that most parts and part groupings of physical systems fail in roughly predictable ways with roughly local effects, but digital/discrete systems can fail roughly anywhere and everywhere from one bit being off. Eg Parnas et al [Evaluation of Safety- Critical Software](http://www.cs.unm.edu/~cris/591/parnas1990evaluation.pdf). – philipxy Sep 01 '15 at 23:51
  • "In conventional engineering, every design and manufacturing dimension can be characterized by a tolerance. One is not required to get things exactly right; being within the specified *tolerance* of the right value is good enough. The use of a tolerance is justified by the assumption that small errors have small consequences. It is well known that in software, trivial clerical errors can have major consequences. No useful interpretation of tolerance is known for software. A single punctuation error can be disastrous, even though fundamental oversights sometimes have negligible effects." – philipxy Sep 01 '15 at 23:54
7

What you are referring to is more likely implementation defined and not undefined behavior. The former is when the standard doesn't tell you what will happen but it should work the same if you are using the same compiler and the same platform. An example for this is assuming that an int is 4 bytes long. UB is something more serious. There the standard doesn't say anything. It is possible that for a given compiler and platform it works, but it is also possible that it works only in some of the cases.

An example is using uninitialized values. If you use an uninitialized bool in an if, you may get true or false, and it may happen that it is always what you want, but the code will break in several surprising ways.

Another example is dereferencing a null pointer. While it will probably result in a segfault in all cases, but the standard doesn't require the program to even produce the same results every time a program is run.

In summary, if you are doing something that is implementation defined, then you are safe if you are only developing to one platform and you tested that it works. If you are doing something that is undefined behavior, then you are probably not safe in any case. There may be that it works but nothing guarantees it.

petersohn
  • 11,292
  • 13
  • 61
  • 98
  • If the compiler vendor does not state, `int` is 4 bytes long, is that implementation defined? Or, just a reasonable expectation of how the UB will react on that architecture? `time_t ` remained constant for years, but no vendor guaranteed it, and then they changed it years later. I would imagine **defined** needs to be explicitly stated, or its just a reasonable expectation and completely UB. That said, there seems a difference on a consistent platform between `int` size and uninitialized variables – Glenn Teitelbaum Aug 28 '15 at 14:46
  • The C standard only states that `short` <= `int` <= `long` <= `long long`, but it should be the same across compilations with the same compiler and architecture. That's why it's implementation defined. However, `int32_t` is guaranteed to have 32 bits, which is exactly defined. – petersohn Aug 28 '15 at 17:47
  • 1
    The Standard can only specify an action as invoking Implementation-Defined behavior if every possible C implementations will be able to specify a *completely* predictable behavior. If overflow were Implementation-Defined rather than undefined, and a hardware platform trapped overflows for ADD instructions but not for INC, an implementation for such a platform would not be allowed to use INC instructions on signed numbers unless it could guarantee consistent behavior. I see no reason to believe that implementations were never expected to try to document behaviors beyond those required... – supercat Aug 28 '15 at 19:25
  • 1
    ...to the extent that they could do so practically. Many kinds of useful code cannot be written efficiently without using behavior not defined by the Standard (e.g. write a version of `void addCopy(int *dest, int *src1, int *src2, int size) { for (int i=0; i – supercat Aug 28 '15 at 19:35
  • @supercat In a case like that, would it help to move `addCopy` from a C++ file to a C file in order to make unrelatedness explicit with the `restrict` qualifier? – Damian Yerrick Aug 29 '15 at 16:30
  • @tepples: If one were writing the function as shown, adding the `restrict` qualifier would make explicit a requirement that the memory regions not overlap [note that the function as written would work if they overlap, provided the destination address precedes any source that overlaps with it, but adding `restrict` would break that]. My point was that if one *wants* to write a function which can efficiently and correctly handle all cases where the destination and sources may overlap (with the semantics that it will always copy the "old" data), there's no efficient way to do so in... – supercat Aug 31 '15 at 13:53
  • @supercat Now I get it: there's no way to make `std::memmove` using only Standard-defined behavior other than as two copies. – Damian Yerrick Aug 31 '15 at 13:55
  • This example was actually better suited to C than C++ (I'm a C programmer) since C++ platforms are required to supply `std::less`; in C, if one wants to simply move memory, one can get around the inability to implement `memmove` oneself simply by using `memmove`, but if one wants to do something to the data while copying it there's no way to achieve that efficiently in standard C; most implementations provide a way to do that efficiently, but it requires using behavior upon which the Standard imposes no requirements. – supercat Aug 31 '15 at 14:02
5

Think about it a different way.

Undefined behavior is ALWAYS bad, and should never be used, because you never know what you will get.

However, you can temper that with

Behavior can be defined by parties other than just the language specification

Thus you should never rely on UB, ever, but you can find alternate sources which state that a certain behavior is DEFINED behavior for your compiler in your circumstances.

Yakk gave great examples regarding the fast delegate classes. In those cases, the author explicitly claims that they are engaging in undefined behavior, according to the spec. However, they then go to explain a business reason why the behavior is better defined than that. For example, they declare that the memory layout of a member function pointer is unlikely to change in Visual Studio because there would be rampant business costs due to incompatibilities which are distasteful to Microsoft. Thus they declare that the behavior is "de facto defined behavior."

Similar behavior can be seen in the typical linux implementation of pthreads (to be compiled by gcc). There are cases where they make assumptions about what optimizations a compiler is allowed to invoke in multithreaded scenarios. Those assumptions are stated plainly in comments in the sourcecode. How is this "de facto defined behavior?" Well, pthreads and gcc go kind of hand in hand. It would be considered unacceptable to add an optimization to gcc which broke pthreads, so nobody will ever do it.

However, you cannot make the same assumption. You may say "pthreads does it, so I should be able to as well." Then, someone makes an optimization, and updates gcc to work with it (perhaps using __sync calls instead of relying on volatile). Now pthreads keeps functioning... but your code doesn't anymore.

Also consider the case of MySQL (or was it Postgre?) where they found a buffer overflow error. The overflow had actually been caught in the code, but it did so using undefined behavior, so the latest gcc started optimizing the entire check out.

So, in all, look for an alternate source of defining the behavior, rather than using it while it is undefined. It is totally legit to find a reason why you know 1.0/0.0 equals NaN, rather than causing a floating point trap to occur. But never use that assumption without first proving that it is a valid definition of behavior for you and your compiler.

And please oh please oh please remember that we upgrade compilers every now and then.

Cort Ammon
  • 10,221
  • 31
  • 45
4

Historically, C compilers have generally tended to act in somewhat-predictable fashion even when not required to do so by the Standard. On most platforms, for example, a comparison between a null pointer and a pointer to a dead object will simply report that they are not equal (useful if code wishes to safely assert that the pointer is null and trap if it isn't). The Standard does not require compilers to do these things, but historically compilers which could do them easily have done so.

Unfortunately, some compiler writers have gotten the idea that if such a comparison could not be reached while the pointer was validly non-null, the compiler should omit the assertion code. Worse, if it can also determine that certain input would cause the code to be reached with an invalid non-null pointer, it should assume that such input will never be received, and omit all code which would handle such input.

Hopefully such compiler behavior will turn out to be a short-lived fad. Supposedly, it's driven by a desire to "optimize" code, but for most applications robustness is more important than speed, and having compilers mess with code that would have limited the damage caused by errant inputs or errand program behavior is a recipe for disaster.

Until then, however, one must be very careful when using compilers to read the documentation carefully, since there's no guarantee that a compiler writer won't have decided that it was less important to support useful behaviors which, though widely supported, aren't mandated by the Standard (such as being able to safely check whether two arbitrary objects overlap), than to exploit every opportunity to eliminate code which the Standard doesn't require it to execute.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • If you mean that historically, C compilers (or any compilers) were quite incapable of optimizing (or at least far worse), that's certainly true. And unoptimized machine-code is certainly easier to predict. – Deduplicator Sep 01 '15 at 21:05
  • @Deduplicator: Historically, compilers had options for optimization, some of which could affect code behavior using semantics which were different from (and in many ways more useful than) the Strict Aliasing Rule, but compiler documentation described the cases where various optimizations could pose problems and would need to be disabled for certain routines. – supercat Sep 01 '15 at 22:00
  • That comment reads to me like you are going back to "nothing changed". Could you elaborate, so I understand which crucial point I'm missing? – Deduplicator Sep 01 '15 at 22:02
  • @Deduplicator: Turbo C 2.0 had an optimization option which, given `void foo(void) { int x; bar(&x); x++; boz(); x++; printf("%d",x);}` might cause the compiler to keep x in SI or DI while code called `boz`. The compiler documentation specifically warned that it was necessary to disable that optimization when passing local variables to a routine that might persist a pointer which could be used to modify the variable after the routine returned. Failure to disable the optimization might cause `boz()` to see the value of `x` before the first increment, and might cause `foo` to ignore any... – supercat Sep 02 '15 at 02:12
  • ...change made to `x` by `boz()`, but the compiler would be fundamentally generating code according to what was written. That is very different from modern optimizers which, given `if (x< 65536) foo(); x*=y;` will make the call to `foo` unconditional in cases where they can determine that `y` must be greater than 32768, on the basis that if `x` weren't less than 65536 the code would engage in Undefined Behavior. – supercat Sep 02 '15 at 02:15
  • Yes, assuming that no local ever escapes the current scope if given opportunity is quite a different order of assumption than the latter, being non-conforming. But there is an option to make the latter defined anyway on most (or all?) compilers as an extension. – Deduplicator Sep 02 '15 at 11:21
  • @Deduplicator: If the Standard didn't specify that code which takes the address of a "register" variable is ill-formed, a useful meaning for "register" would be to authorize the compiler to keep a variable in register during any function call *whose parameters do not explicitly include the address of that variable* [possibly adding syntax constraints to enforce this, though that would impair one of the more common uses]. IMHO, such rules could probably be less intrusive and more useful than Strict Aliasing, though the fact that code taking the address of a register variable is ill-formed... – supercat Sep 02 '15 at 16:47
  • ...would be problematic. – supercat Sep 02 '15 at 16:47
  • Sure, that would, under the given assumptions, be an option. Though I would prefer a more generally applicable (and by the library-writer once, instead of the user always) notation for "won't retain any references". I think Rust does that. – Deduplicator Sep 02 '15 at 16:51
  • @Deduplicator: I agree. Perhaps allow `void register copymem(register *src, register const *dest, size_t n);`, though I'm not sure how to handle `memcpy` which could return `dest` as a pointer [the `register` qualify on the method would indicate it neither reads nor writes anything not passed into it]. Client-side, though, it could be useful to have a notation to indicate that within a certain scope, a compiler should be free, but not obligated, to cache a variable from outside that scope, when calling a function that might modify some outside variables, but not the one of interest. – supercat Sep 02 '15 at 17:02
  • @Deduplicator: I find the Strict Aliasing Rule curious, actually, in that the type of a variable really has very little to do with whether it could alias anything. It leads to situations where portability would require `memmove(&thing, &thing, sizeof(thing));`, creating an ironic situation where the compilers that would be just fine without the `memmove` are those where the `memmove` is most likely to be expensive; using `memcpy` would be faster on many platforms, and I don't think there's any platform that couldn't easily make `memcpy` work, but some compilers would use `memcpy`... – supercat Sep 02 '15 at 17:11
4

Undefined behavior can be altered by things such as the ambient temperature, which causes rotating hard disk latencies to change, which causes thread scheduling to change, which in turn changes the contents of the random garbage that's getting evaluated.

In short, not safe unless the compiler or the OS specifies the behavior (since the language standard didn't).

Joshua
  • 40,822
  • 8
  • 72
  • 132
3

There is a fundamental problem with undefined behavior of any kind: It is diagnosed by sanitizers and optimizers. A compiler can silently change behavior corresponding to those from one version to another (e.g. by expanding its repertoire), and suddenly you'll have some untraceable error in your program. This should be avoided.

There is undefined behavior that is made "defined" by your particular implementation, though. A left shift by a negative amount of bits can be defined by your machine, and it would be safe to use it there, as breaking changes of documented features occur quite rarely. One more common example is strict aliasing: GCC can disable this restriction with -fno-strict-aliasing.

Columbo
  • 60,038
  • 8
  • 155
  • 203
  • 1
    "A compiler can silently change behavior corresponding to those from one version to another" True, but the user is strictly mentioning the opposite where the implementation in general won't change at all as in embedded systems. – edmz Aug 28 '15 at 14:04
  • @black Can't see what you mean. *"Once the UB has manifested for that architecture and that compiler and you have tested, can't you assume that from then on whatever the compiler did with the UB the first time, it will do that every time?"* That does not talk about compiler versions, AFAICS. – Columbo Aug 28 '15 at 14:07
  • "What if you are only interested in one platform and only one compiler and you know you will be using them for years?" The "you'll be using them for years" is misleading. It could refer to what you say or what I do, namely where the compiler version won't change along with the compiler and OS, as you say. – edmz Aug 28 '15 at 14:14
3

While I agree with the answers that say that it's not safe even if you don't target multiple platforms, every rule can have exceptions.

I would like to present two examples where I'm confident that allowing undefined / implementation-defined behavior was the right choice.

  1. A single-shot program. It's not a program which is intended to be used by anyone, but it's a small and quickly written program created to calculate or generate something now. In such a case a "quick and dirty" solution can be the right choice, for example, if I know the endianness of my system and I don't want to bother with writing a code which works with the other endianness. For example, I only needed it to perform a mathematical proof to know if I'll be able to use a specific formula in my other, user-oriented program or not.

  2. Very small embedded devices. The cheapest microcontrollers have memory measured in a few hundred bytes. If you develop a small toy with blinking LEDs or a musical postcard, etc, every penny counts, because it will be produced in the millions with a very low profit per unit. Neither the processor nor the code ever changes, and if you have to use a different processor for the next generation of your product, you will probably have to rewrite your code anyway. A good example of an undefined behavior in this case is that there are microcontrollers which guarantee a value of zero (or 255) for every memory location at power-up. In this case you can skip the initialization of your variables. If your microcontroller has only 256 bytes of memory, this can make a difference between a program which fits into the memory and a code which doesn't.

Anyone who disagrees with point 2, please imagine what would happen if you told something like this to your boss:

"I know the hardware costs only $ 0.40 and we plan selling it for $ 0.50. However, the program with 40 lines of code I've written for it only works for this very specific type of processor, so if in the distant future we ever change to a different processor, the code will not be usable and I'll have to throw it out and write a new one. A standard-conforming program which works for every type of processor will not fit into our $ 0.40 processor. Therefore I request to use a processor which costs $ 0.60, because I refuse to write a program which is not portable."

vsz
  • 4,811
  • 7
  • 41
  • 78
  • 1
    Code which only had to be portable among compilers that were popular in 1990s could in many cases be written to be smaller, faster, more robust, and easier to read, than code which must avoid Undefined Behavior at all costs. Unfortunately, some people think it's more important to let compilers optimize useless programs than to let programmers write code which is small, fast, robust (on platforms which offer loose behavioral guarantees), and readable. – supercat Oct 09 '15 at 23:23
2

"Software that doesn't change, isn't being used."

If you are doing something unusual with pointers, there's probably a way to use casts to define what you want. Because of their nature, they will not be "whatever the compiler did with the UB the first time". For example, when you refer to memory pointed at by an uninitialize pointer, you get a random address that is different every time you run the program.

Undefined behavior generally means you are doing something tricky, and you would be better off doing the task another way. For instance, this is undefined:

printf("%d %d", ++i, ++i);

It's hard to know what the intent would even be here, and should be re-thought.

Engineer
  • 834
  • 1
  • 13
  • 27
  • The citation (of whom?) in the first line is just false. For instance, if it were true, why would be virtually impossible to uninstall `ed` (of "[ed is the standard text editor](http://www.gnu.org/fun/jokes/ed-msg.html)" fame) on a Linux box? I don't know what, but _something_ is using it. Or maybe less anecdotally, the core of the TeX program is all but frozen (a single bug fixed since 2008, according to [this log](http://mirrors.ctan.org/systems/knuth/dist/errata/errorlog.tex)). – Marc van Leeuwen Aug 29 '15 at 12:29
  • Software that doesn't change very often, isn't being used very often. Better? – Engineer Aug 29 '15 at 18:49
  • 1
    Software that doesn't change, was well scoped – Glenn Teitelbaum Sep 03 '15 at 20:42
1

Changing the code without breaking it requires reading and understanding the current code. Relying on undefined behavior hurts readability: If I can't look it up, how am I supposed to know what the code does?

While portability of the program might not be an issue, portability of the programmers might be. If you need to hire someone to maintain the program, you'll want to be able to look simply for a '<language x> developer with experience in <application domain>' that fits well into your team rather than having to find a capable '<language x> developer with experience in <application domain> knowing (or willing to learn) all the undefined behavior intrinsics of version x.y.z on platform foo when used in combination with bar while having baz on the furbleblawup'.

das-g
  • 9,718
  • 4
  • 38
  • 80
1

Nothing is changing but the code, and the UB is not implementation-defined.

Changing the code is sufficient to trigger different behavior from the optimizer with respect to undefined behavior and so code that may have worked can easily break due to seemingly minor changes that expose more optimization opportunities. For example a change that allows a function to be inlined, this is covered well in What Every C Programmer Should Know About Undefined Behavior #2/3 which says:

While this is intentionally a simple and contrived example, this sort of thing happens all the time with inlining: inlining a function often exposes a number of secondary optimization opportunities. This means that if the optimizer decides to inline a function, a variety of local optimizations can kick in, which change the behavior of the code. This is both perfectly valid according to the standard, and important for performance in practice.

Compiler vendors have become very aggressive with optimizations around undefined behavior and upgrades can expose previously unexploited code:

The important and scary thing to realize is that just about any optimization based on undefined behavior can start being triggered on buggy code at any time in the future. Inlining, loop unrolling, memory promotion and other optimizations will keep getting better, and a significant part of their reason for existing is to expose secondary optimizations like the ones above.

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740