7

The C standard makes clear that a compiler/library combination is allowed to do whatever it likes with the following code:

int doubleFree(char *p)
{
  int temp = *p;
  free(p);
  free(p);
  return temp;
}

In the event that a compiler does not require use of a particular bundled library, however, is there anything in the C standard which would forbid a library from defining a meaningful behavior? As a simple example, suppose code were written for a platform which had reference-counted pointers, such that following p = malloc(1234); __addref(p); __addref(p); the first two calls to free(p) would decrement the counter but not free the memory. Any code written for use with such a library would naturally work only with such a library (and the __addref() calls would likely fail on most others), but such a feature could be helpful in many cases when e.g. it is necessary to pass the a string repeatedly to a method which expects to be given a string produced with strdup and consequently calls free on it.

In the event that a library would define a useful behavior for some action like double-freeing a pointer, is there anything in the C standard which would authorize a compiler to unilaterally break it?

supercat
  • 77,689
  • 9
  • 166
  • 211
  • 5
    Doesn't the answer follow from the definition of undefined behavior? Am I missing something here? – Shafik Yaghmour Apr 23 '15 at 17:16
  • 7
    Undefined behavior is undefined. – Billy ONeal Apr 23 '15 at 17:19
  • 1
    "Undefined" means just that. There was a time when GCC implemented the #pragma directive by starting the game "rogue". – Lee Daniel Crocker Apr 23 '15 at 17:20
  • @ShafikYaghmour: Historically, the fact that calling `free(p)` twice was Undefined Behavior meant that the standard imposed no restrictions on how the *function* would behave. It in no way forbade library authors from offering guarantees *beyond* those mandated by the standard; indeed, many libraries do offer additional guarantees in many cases. – supercat Apr 23 '15 at 17:20
  • The compiler could certainly define whatever behavior its implementors want. There are lots of compilers that don't even recognize standard C, like Java compilers. – mpez0 Apr 23 '15 at 17:20
  • @supercat that is what I mean the definition says `imposes no requirements` which I feel answers your question. `gcc` and `clang` sure do offer well defined behavior for things formally undefined. – Shafik Yaghmour Apr 23 '15 at 17:21
  • @LeeDanielCrocker: When did "undefined" shift from "You're allowed to do this action if and only if you're happy with whatever the library does" to meaning "This action is forbidden even if the library's behavior would be useful"? Note that systems programming--*which is the whole reason C was invented*--is absolutely impossible under the latter interpretation. – supercat Apr 23 '15 at 17:21
  • 1
    I have no idea what you're talking about. "Undefined" has always meant "This text is not a well-formed C program. A C compiler is free to treat it any way it chooses". If you are happy to read the documentation for your particular compiler and make use of some nonstandard feature, that's fine, but know what you are doing. – Lee Daniel Crocker Apr 23 '15 at 17:30
  • @supercat: Undefined behavior is effectively forbidden if you want reliable, portable code. But if your standards are lower than that... – Andrew Henle Apr 23 '15 at 17:33
  • @LeeDanielCrocker: My point is that historically it used to be very common for C compilers to specify behaviors beyond what the standard mandated; one would be hard-pressed to find a mainstream compiler designed between 1990 and 2005 where `-1<<1` wouldn't yield -2. I find disturbing the shift from "Compilers have almost universally come to recognize that having `-1<<1` yield -2 is the most sensible thing for that expression to do" to "Compilers should avoid doing anything beyond what the standard requires, regardless of recent historical practice". – supercat Apr 23 '15 at 17:35
  • @LeeDanielCrocker: Further, systems programming without Undefined Behavior is impossible, since the standard doesn't define all the tools necessary to accomplish it. Historically, that hasn't been a problem, since compilers would let Undefined Behavior do whatever it would do on the underlying platform, but without that ability or any defined alternative I don't see how C can be useful for its intended purpose. – supercat Apr 23 '15 at 17:38
  • @supercat I saw you had a related post several days ago but it was a bit broad. I would interested in a more specific post which goes into a specific instance of systems programming problem that you feel is impossible without invoking undefined behavior and see if anyone can come up with a conforming alternative. – Shafik Yaghmour Apr 23 '15 at 17:41
  • @ShafikYaghmour: For starters, casting a number to a pointer is only defined if the number in question has previously been yielded by a cast from a pointer type. Thus, something like `((volatile uint16_t*)0xB8000000)[0]` = 0x0C01;` would be Undefined Behavior unless some previous pointer-to-number cast had yielded 0xB8000000. – supercat Apr 23 '15 at 17:43
  • 2
    Well clearly then the Linux kernel and Python language must not exist, since system programming is impossible with such loosely defined compiler behavior. – Lee Daniel Crocker Apr 23 '15 at 17:44
  • @ShafikYaghmour: Or else they make use of certain behaviors which, though not defined according to the standard, compilers haven't yet messed with. – supercat Apr 23 '15 at 17:45
  • @LeeDanielCrocker: The set of behaviors which compilers haven't *yet* made useless may be sufficient to allow systems programming, but that doesn't mean that systems programming would be possible on a compiler which could arbitrarily rewrite any and all code whose behavior was not mandated by standard. – supercat Apr 23 '15 at 17:50
  • So your complaint is purely theoretical and meaningless. Yes, compiler authors are *allowed* to do hideous things. So what? Those who do won't last long. Compilers that do useful things will be the ones that catch on. Standards are always years behind the people actually making things work. – Lee Daniel Crocker Apr 23 '15 at 17:54
  • @LeeDanielCrocker: A number of security vulnerabilities have caused real-world harm as a consequence of compilers excising code which they thought was unnecessary, but which would have--but for such excission--prevented the harm in question. I don't consider that "theoretical" and "meaningless". Specific question: Given the base addresses and sizes of two objects, is there any reasonable way to determine whether they overlap in portable C? – supercat Apr 23 '15 at 17:57
  • That's true, and working programmers have discovered these things, and compiler vendors have made changes. Eventually, standards bodies may recognize some of these changes. Or maybe they won't. Standards bodies only report after-the-fact what those of us doing actual work have done. – Lee Daniel Crocker Apr 23 '15 at 18:02
  • As I understand, the compiler is free to aggressively optimize this once it detects the undefined behaviour, regardless of the standard library implementation of `free`. Though for one second I'd prefer being able to use my glibc fork that completely defies the standards, `free` is a standard-defined function and C compilers have no obligations to work with any conflicting implementations. In this case, compiler doesn't even have to call the second `free`. – holgac Apr 23 '15 at 18:09
  • @LeeDanielCrocker: regarding the Linux kernel, it is compiled with `-fno-strict-aliasing` (among others, I've just noticed `-fno-delete-null-pointer-checks`), so^W_because_ it is written in a variant of C without C's type-based aliasing rules. – ninjalj Apr 23 '15 at 23:34

3 Answers3

3

There is really two question here, your formally stated one and your broader one outlined in your comments to questions raised by others.

Your formal question is answers by the definition of undefined behavior and section 4 on conformance. The definition says (emphasis mine):

behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

With emphasis on nonportable and imposes no requirements. This really says it all, the compiler is free to optimize in unpleasant manners or can also chose to make the behavior documented and well defined, this of course mean the program is no longer strictly conforming, which brings us to section 4:

A strictly conforming program shall use only those features of the language and library specified in this International Standard.2) It shall not produce output dependent on any unspecified, undefined, or implementation-defined behavior, and shall not exceed any minimum implementation limit.

but a conforming implementation is allowed extensions as long as they don't break a conforming program:

A conforming implementation may have extensions (including additional library functions), provided they do not alter the behavior of any strictly conforming program.3)

As the C FAQ says:

There are very few realistic, useful, strictly conforming programs. On the other hand, a merely conforming program can make use of any compiler-specific extension it wants to.

Your informal question deals with compilers taking more aggressive optimization opportunies with undefined behavior and in the long run the fear this will make real world systems programming impossible. While I do understand how this relatively new aggressive stance seems very programmer unfriendly to many in the end a compiler won't last very long if people can not build useful programs with it. A related blog post by John Regehr: Proposal for a Friendly Dialect of C.

One could argue the opposite, that compilers have made a lot of effort to build extensions to support varying needs not supported by the standard. I think the article GCC hacks in the Linux kernel demonstrates this well. It goes into the many gcc extensions that the Linux kernel relies on and clang has in general attempted to support as many gcc extensions as possible.

Whether compilers have removed useful handling of undefined behavior which hampers effective systems programming is not clear to me. I think specific questions on alternatives for individual cases of undefined behavior that has been exploited in systems programming and no longer work would be useful and interesting to the community.

Community
  • 1
  • 1
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • What I find most bothersome is that compiler writers seem more interested in revoking guarantees which most implementations used to provide (e.g. behavior of `-1<<1`) than in providing programmers with ways of relieving the compilers of guarantees they don't need (e.g. on a two's-complement 32-bit system where INT16_MIN==-32768, given `int32_t v=32768; int16_t x=v;`, the compiler would not from what I can tell be allowed to use a 32-bit register for `x` unless it either trims any values stored there or trims any untrimmed values read from there). – supercat Apr 23 '15 at 18:36
  • The optimization opportunities offered by UB pale in comparison to those which programmers could freely offer compilers. Although compilers may not be able to aggressively optimize the code of programmers who aren't satisfied with the size and execution speed they get without such optimization, so what? If a programmer is happier with a program that's 20% larger than it needs to be, but which he's confident will work correctly, why should the compiler writer object? – supercat Apr 23 '15 at 18:43
  • What I gleaned from the answer makes sense. That for a compiler to be meaningful in the world of programming, it must enforce some minimum level of standard and reject extensions that would alter or break that standard. In other words, we all have to play by that minimum set of rules or render a compiler meaningless from a portability or reliability standpoint. They way I see it, that is the function the standards serve. Every so many years, all the neat new ideas have a chance to be considered for inclusion. Some are kept, some rejected, but the standard remains, providing consistency. – David C. Rankin Apr 23 '15 at 18:45
  • @DavidC.Rankin: A fundamental weakness with the C standard, I think, is that it is written purely from a requirements perspective and not from a normative perspective. Consider the following two hypothetical definitions for left-shift of a negative number: (1) An implementation must either implement x< – supercat Apr 23 '15 at 18:57
  • @DavidC.Rankin: Any implementation which complied with the latter could easily be made to comply with the former by, at most, adding a sentence to its documentation. Thus, from a requirements perspective, the second is effectively a concise way of saying the first. On the other hand, the first would allow the language to move toward consistent implementations of useful constructs, while the second gives the present situation. – supercat Apr 23 '15 at 19:03
  • @supercat You bring up a very good point and I think it points to yet another issue. The original being the standard itself, the purpose it serves, and what it imposes on the programmer. The second issue your example touches on is whether the standard, as it currently exists, is sufficient to address the current use of the language in today's programming environment. I've never followed the wranglings of the drafting committee as they revise the standard, but I suspect one side in the debate seeks to prevent broadening the standard in any way that would curtail innovative use of the language. – David C. Rankin Apr 23 '15 at 19:15
  • @DavidC.Rankin: Given that it is no longer necessary that compilers be able to run under resource constraints anything close to those that were common when C was invented, it should be possible, and not overly difficult, to design compilers to offer, for most kinds of operation that are presently undefined, as well as many which are more tightly specified than many applications require, a range of behaviors. Many compilers already do in fact offer such choices, *but there is no standard way of exposing them to the programmer*. – supercat Apr 23 '15 at 19:30
  • @supercat: The most constrained resources are the compiler team members (programmers, debuggers, testers), and those are as limited as ever, for the most part. Most cases where a compiler replaces a working behavior with a more optimized behavior where conformant code continues to work correctly (but code with UB begins to fail) and does not make the old behavior available as an option can be directly attributed to lack of manpower. – Ben Voigt Apr 23 '15 at 20:41
  • @BenVoigt: Can you point me to any research that shows the costs and benefits of various optimizations? I have observed a lot of places looking at compiled code where I'd be more than happy to give a compiler more freedom to optimize things if there were a way to express that (e.g. places where in a memory-constrained system I use `int16_t` or `uint16_t` but would be perfectly happy with a compiler using a 32-bit register without clipping it) or places where I don't care what value would be computed in case of overflow but UB beyond that would be unacceptable. – supercat Apr 23 '15 at 20:48
  • @BenVoigt that is an interesting explanation but very vocal pushers of more aggressive treatment of undefined behavior don't seem to making that case at conferences and on mailing lists. Their argument is that this allows for real performance enhancing optimizations and that the tools will eventually catch up to statically catch these issues before they come security flaws. This position was very strongly present at the last cppcon. – Shafik Yaghmour Apr 23 '15 at 20:48
  • @BenVoigt: I fully appreciate that having precisely-defined overflow semantics is expensive. On the other hand, I would suggest that allowing code to request somewhat-constrained overflow semantics could improve performance by eliminating the need for manual overflow-prevention code which, being explicitly-written code, would have rigidly-defined semantics; this would allow the compiler to optimize around semantics that were looser than what manual code would have specified, but still tight enough to meet a programmer's actual requirements. – supercat Apr 23 '15 at 21:00
  • @Shafik: That explains why they add the new optimization, not why they make it mandatory. Lack of manpower is why they don't support multiple modes. – Ben Voigt Apr 23 '15 at 23:06
0

In the event that a library would define a useful behavior for some action like double-freeing a pointer, is there anything in the C standard which would authorize a compiler to unilaterally break it?

The compiler and the standard library (i.e. the one in which free is defined) are both part of the implementation - it isn't really coherent to talk about one of them doing something "unilaterally".

If a compiler "does not require use of a particular bundled library", then (other than perhaps as a freestanding implementation) it alone is not an implementation, so the standard doesn't apply to it at all. The behavior of a combination of a library and a compiler are the responsibility of whoever chooses to combine them (which may be the author of either component, or someone else entirely) and label this combination as an implementation. It would, of course, be wise not to document extensions implemented by the library as features of this implementation without confirming that the compiler does not break them. For that matter, you would also need to make sure that the compiler doesn't break anything used internally by the library.


In answer to your main question: no, it does not. If the end result of combining a library and a compiler (and kernel, dynamic loader, etc) is a conforming hosted environment, it is a conforming implementation even if some extensions that the library's author would like to have provided are not supported by the final result of combining them, but it does not require them to work, either. Conversely, if the result does not conform - for example if the compiler breaks the internals of the library and thereby causes some library function not to conform - then it is not a conforming implementation. Any program which calls free twice on the same pointer, or uses any reserved identifier starting with two underscores, causes undefined behavior and therefore is not a strictly conforming program.

Random832
  • 37,415
  • 3
  • 44
  • 63
  • If a non-hosted implementation does not include `malloc/free/realloc` (I don't think they're required in such), would that effectively imply that a program could legitimately use such identifiers in any way it sees fit and it would be improper for a compiler to regard two consecutive calls to `free()` as being any different from two consecutive calls to `fnord()`? – supercat Apr 28 '15 at 18:57
  • @supercat Right, but there's nothing saying that the compiler has to behave the same as part of a hosted implementation than the same compiler as part of a freestanding implementation. Such a restriction would not be coherent because "compiler" does not exist as a concept in the C standard, and it's two different implementations anyway whether implemented by two compilers or one compiler in two different modes. – Random832 Apr 28 '15 at 18:59
  • @supercat on further thought, I don't think there is actually anything in the standard, strictly speaking, that changes the rules about reserved identifiers for programs running on freestanding implementations. So, `free` remains reserved for use as an identifier with external linkage. – Random832 Apr 28 '15 at 19:26
0

Does C standard mandate that platforms must not define behaviors beyond those given in standard

Quite simply, no, it does not. The standard says:

An implementation shall be accompanied by a document that defines all implementation- defined and locale-specific characteristics and all extensions.

There is no restriction anywhere in the standard that prohibits implementations from providing any other documentation they like. If you like, you can read N1570, the latest freely available draft of the ISO C standard, and confirm the lack of any such prohibition.

In the event that a library would define a useful behavior for some action like double-freeing a pointer, is there anything in the C standard which would authorize a compiler to unilaterally break it?

A C implementation includes both the compiler and the standard library. free() is part of the standard library. The standard does not define the behavior of passing the same pointer value to free() twice, but an implementation is free to define the behavior. Any such documentation is not required, and is outside the scope of the C standard.

If a C implementation documented, for example, that calling free() a second time on the same pointer value has no effect, but then doing so actually causes the program to crash, that would violate the implementation's own documentation, but it would not violate the C standard. There is no specific requirement in the C standard that says an implementation must conform to its own documentation, beyond the documentation that's required by the standard. An implementation's conformance to its own documentation is enforce by the market and by common sense, not by the C standard.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631