3

We know what undefined behavior is and we (more or less) know the reasons (performance, cross-platform compatibility) of most of them. Assuming a given platform, say Windows 32 bit, can we consider an undefined behavior as well-known and consistent across the platform? I understand there is not a general answer then I would restrict to two common UB I see pretty often in production code (in use from years).

1) Reference. Give this union:

union {
    int value;
    unsigned char bytes[sizeof(int)];
} test;

Initialized like this:

test.value = 0x12345678;

Then accessed with:

for (int i=0; i < sizeof(test.bytes); ++i)
    printf("%d\n", test.bytes[i]);

2) Reference. Given an array of unsigned short* casting to (for example) float* and accessing it (reference, no padding between array members).

Is code relying on well-known UBs (like those) working by case (assuming compiler may change and for sure compiler version will change for sure) or even if they're UB for cross-platform code they rely on platform specific details (then it won't change if we don't change platform)? Does same reasoning apply also to unspecified behavior (when compiler documentation doesn't say anything about it)?

EDIT According to this post starting from C99 type punning is just unspecified, not undefined.

Community
  • 1
  • 1
Adriano Repetti
  • 65,416
  • 20
  • 137
  • 208
  • 1
    If I recollect correctly, using unions in that way is actually *well defined* and mentioned in the specification . – Some programmer dude Dec 11 '14 at 12:00
  • @JoachimPileborg in theory, I'm thinking about [this](http://stackoverflow.com/a/1812359/1207195). – Adriano Repetti Dec 11 '14 at 12:02
  • 1
    Side note: Change `4` to `sizeof(int)`, which is not necessarily 4 on every platform out there. – barak manos Dec 11 '14 at 12:03
  • 3
    @JoachimPileborg, C90 makes it implementation-defined. C99 is a little vague, but the intent is to allow type-punning (see [DR 283](http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm)). – mafso Dec 11 '14 at 12:06
  • 1
    As far as I can think of, a 32-bit platform would impose `sizeof(void*) == 4`, but not necessarily `sizeof(int) == 4`. – barak manos Dec 11 '14 at 12:07
  • 2
    @barakmanos you're right! – Adriano Repetti Dec 11 '14 at 12:09
  • @AdrianoRepetti, what do you mean in theory. It is well defined in pratice and basically this is what `union`s are made for. BTW, the accepted answer in the question that you are pointing to is simply wrong. – Jens Gustedt Dec 11 '14 at 12:11
  • 3
    In the C11 standard it's footnote 95 (in §6.5.2.3): "If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ‘‘type punning’’)." – Some programmer dude Dec 11 '14 at 12:11
  • @JensGustedt I thought they were there just to _share_ same memory address. I know they're overused for punning but C99 also says _"This might be a trap representation"_. See also [this post](http://stackoverflow.com/a/11640603/1207195). That's where my confusion comes from. It's widely used but I don't understand where standard says it's perfectly allowed and if, even if undefined, is _safe_. (for both UB (?) I pointed out). – Adriano Repetti Dec 11 '14 at 12:19
  • 1
    Side note #2: In a compiler which is designated for a 32-bit platform **AND** defines `CHAR_BIT` as 16 (i.e., supplied with `limits.h` defining it as 16), `sizeof(void*)` will be 2. That being said, I'm not sure whether or not this combination is viable or even feasible (might be worth posting a question on it). – barak manos Dec 11 '14 at 12:22
  • @barakmanos is it possible for CHAR_BITS to be 16 on Windows? (OK it's something I never ever considered possible, my bad). – Adriano Repetti Dec 11 '14 at 12:23
  • @AdrianoRepetti, Joachim already gave you the citation, what more do you want? – Jens Gustedt Dec 11 '14 at 12:23
  • @AdrianoRepetti: I think it's an SDK definition. I don't think that it is tightly-coupled with neither the OS nor the underlying HW architecture. To be honest, I've never seen `CHAR_BIT` defined as any other value besides 8. – barak manos Dec 11 '14 at 12:24
  • What does **UB** mean? – bzeaman Dec 11 '14 at 12:26
  • 1
    @BennoZeeman [Undefined Behavior](http://en.wikipedia.org/wiki/Undefined_behavior) – Some programmer dude Dec 11 '14 at 12:26
  • in answer to the question posed in the tile: Never. – Tom Tanner Dec 11 '14 at 12:32
  • Maybe as a portable `__builtin_unreachable`? ;) – mafso Dec 11 '14 at 12:49
  • 1
    @mafso _"The __builtin_unreachable() builtin has completely undefined behavior."_ OMG who dares to use something documented to be undefined? LOL – Adriano Repetti Dec 11 '14 at 13:00
  • @JoachimPileborg this footnote is present in ISO/IEC 9899:1999 as well. It was added between the final draft and publication. – tab Dec 11 '14 at 15:32
  • 1
    @AdrianoRepetti: The purpose of a `__builtin_unreachable()` is to provide a clean way for programmers to tell the compiler what they know. For example, given `if (x < 0 || x>=32000) __builtin_unreachable(); y = x/3;` a 32-bit compiler could replace the latter statement with something like `y = (x*0x5556) >> 16;`. The formula would fail for some values of `x` outside the indicated range, but the `__builtin_unreachable();` call would indicate that the compiler shouldn't care. – supercat Dec 22 '14 at 20:34
  • 1
    @AdrianoRepetti: I would consider such an approach much better than having the compiler make inferences based upon possibly-unintentional UB. For example, given `uint16_t q; uint32_t csum;`, a 32-bit compiler encountering the code `if (q==0xFFFF) recCountFF++; csum += q*q;` could legitimately omit the `if` test since there would be no way for it to occur without invoking UB; on most systems, the "natural" behavior of the code if the compiler ignored UB would match the desired behavior, but a compiler which omits the `if` test would totally break that. – supercat Dec 22 '14 at 20:40
  • @supercat thank you, you're more clear than gcc docs!!! – Adriano Repetti Dec 22 '14 at 20:46
  • 1
    @AdrianoRepetti: I really wish the authors of the standards would open up a new category of "Implementation Constrained" behavior between Undefined Behavior and Implementation-defined, and move a lot of things like integer overflow, which are presently UB into that category. An implementation would be required to list things that could happen, but not specify which particular one would, and an implementation could specify that a particular behavior could cause UB, but compilers would have to document all types of IC behavior which could cause UB. – supercat Dec 22 '14 at 20:58
  • @supercat I completely agree. I think we _rely_ on UB pretty (too) often (well even without knowing it's UB) but we all read our compiler documentation more carefully than Cx standard (because of language?). I think many UB are...undefined just because of portable nature of standards, a specific implementation _must_ put fixed points for at least some of them (_it's undefined but implementation - is forced to - says it's always A_). – Adriano Repetti Dec 22 '14 at 21:33
  • @AdrianoRepetti: The problem is that while some things are UB because it's not practical for an a language implementer to guarantee what will happen if a program does a stray pointer access that could clobber the stack, many things are UB even though the set of "natural" behaviors is relatively small. Given `int32_t f=0x7FFFFFFF*0x7FFFFFFF;`, I would think it reasonable for an implementation to say that unless or until the next time `f` is written, any access to `f` may yield any arbitrary value; repeated accesses may read different values (which may or may not be in range for an `int32`),... – supercat Dec 22 '14 at 21:42
  • ...and a compiler would be allowed to make assumptions about `f` which may or may not be true (e.g. the compiler could assume that adding a value to a positive number wouldn't yield a negative number, regardless of whether the value might appear negative), but still have the behavior fall well short of UB. In particular, unless an implementation is documented as possibly trapping overflow, if the results of computations that overflow are ultimately ignored, the Implementation-Constrained behavior would not be "contagious" to other code which didn't use those computations. – supercat Dec 22 '14 at 21:45
  • @supercat maybe I start to understand. Even if UB alone _may_ have a well-defined repeatable result compiler will do assumption (based on good/normal behavior) and these assumption will affect other code too (even when it's not so obvious - to me - how). That's why _Implementation-Constrained behavior_ is still UB (too many cases where it may be broken by compiler itself because, for example, optimizations). Did I catch what you mean? – Adriano Repetti Dec 23 '14 at 07:32
  • @AdrianoRepetti: The expectation with IC would be that compilers would typically be documented as assumptions about the *results* of computation, but refrain from making back-inferences about their operand. For example, given `uint16_t x; int32_t y; y=(x*x > 0) + 2*(x > 50000);` a compiler which was documented in typical fashion would be allowed to infer that `x*x` would always be positive, but would not be allowed to infer that `x` must be less than 46341. – supercat Dec 23 '14 at 16:53
  • @AdrianoRepetti: Indeed, I would like to add as an additional aspect of IC behavior some standard #define labels which would be required to indicate certain aspects. Thus, code could say, e.g. `#if !(__OVERFLOW_MODE && __OVERFLOW_AFFECTS_RESULTS_ONLY) #warning This code requires constrained overflow behavior #endif` and be assured that overflow would not cause full-fledged UB. – supercat Dec 23 '14 at 16:56
  • @AdrianoRepetti: Actually, an approach I'd like even better would be to provide a standard means by which code could request certain semantics, both with regard to various kinds of UB, but also with regard to things like integer promotion. Portability would be massively improved if code could say e.g. "Within this stretch of code, I want integers to always promote to 32 bits but not beyond"; refuse compilation if the request cannot be honored. It may be that on some 16-bit or 64-bit machines that directive would make code run slower... – supercat Dec 23 '14 at 17:01
  • ...but code which slowly performs the computation that's required will be better than code which quickly performs some other computation which won't yield the correct result. Some similar features could be helpful with regard to overflow--have a directive that requires that within a stretch of code signed overflow must yield strict two's-complement result, and another that would allow even unsigned overflow to go into arbitrary-but-constrained or fully-undefined behaviors. – supercat Dec 23 '14 at 17:01
  • @supercat and standard should define for each UB with a reasonable IC a small list of #define we may inspect. D*** nice idea! It may also force compilers (I mean..._all_ compilers) to emit proper warnings for many unnoticed UB (at least when pedantic). Do you mind to post all you wrote here as an answer? – Adriano Repetti Dec 23 '14 at 17:58
  • @AdrianoRepetti: See my answer and tell me what you think. – supercat Dec 23 '14 at 18:46

2 Answers2

3

Undefined behavior means primarily a very simple thing, the behavior of the code in question is not defined so the C standard doesn't provide any clue of what can happen. Don't search more than that in it.

If the C standard doesn't define something, your platform may well do so as an extension. So if you are in such a case, you can use it on that platform. But then make sure that they document that extension and that they don't change it in the next version of your compiler.

Your examples are flawed for several reasons. As discussed in the comments, unions are made for type punning, and in particular an access to memory as any character type is always allowed. Your second example is really bad, because other than you seem to imply, this is not an acceptable cast on any platform that I know. short and float generally have different alignment properties, and using such a thing will almost certainly crash your program. Then, third, you are arguing for C on Windows, which is known for the fact that they don't follow the C standard.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • 1
    1) OK, type punning with union is [unspecified](http://stackoverflow.com/a/6725981/1207195) (starting from C99). BTW aren't unions made to share memory? Type punning came after - UB for over 20 years... 2) Is an array of short padded [BETWEEN](http://stackoverflow.com/q/1676385/1207195) elements and not just at the end? 3) That's it but it's platform I would restrict this question (primary because it runs on smaller variety of hardware architectures). – Adriano Repetti Dec 11 '14 at 12:40
  • 1) Why do you think this is unspecified? The standard explicitly allows access to the different fields of a `union`. Type punning is there since the very beginning of C in particular in Unix, for which C was designed. 2) Arrays are never padded. 3) I don't understand your sentence. – Jens Gustedt Dec 11 '14 at 15:33
  • 1) from linked post. It was there but for sure it was UB before C99 (it always worked? maybe...that's what I'm asking here!) 2) right...then alignment isn't an issue (on X86, I'm limiting my question to a single architecture). 3) On Windows things are sometimes _strange_ but it's where I saw such code (even if I suppose isn't limited to Win32 architecture, as posts about punning suggests) and environment I would limit my question too – Adriano Repetti Dec 11 '14 at 16:03
  • @AdrianoRepetti, for (2) alignment requirements model on which parity an address may lie. `short` usually have 2 bytes, so addresses requirements usually are that the least significant bit is `0`. `float` usually have 4 byte, and so the two lower bits must generally be `0`. – Jens Gustedt Dec 11 '14 at 18:50
  • 2) I mean: common example is writing char* (reading from a stream, for example) and reading int* (but what I mean applies to all other compatible types assuming value won't trap). In X86 alignment isn't an issue (it may slow down access but it's allowed) so what I would understand is if it's UB but _stable_ to be used (assuming platform won't change). Primary because I saw _so much_ code like that... – Adriano Repetti Dec 11 '14 at 21:19
2

First of all, any compiler implementation is free to define any behavior it likes in any situation which would, from the point of view of the standard, produce Undefined Behavior.

Secondly, code which is written for a particular compiler implementation is free to make use of any behaviors which are documented by that implementation; code which does so, however, may not be usable on other implementations.

One of the longstanding shortcomings of C is that while there are many situations where constructs which could Undefined Behavior on some implementations are handled usefully by others, only a tiny minority of such situations provide any means by which code can specify that a compiler which won't handle them a certain way should refuse compilation. Further, there are many cases in which the Standards Committee allows full-on UB even though on most implementations the "natural" consequences would be much more constrained. Consider, for example (assume int is 32 bits)

int weird(uint16_t x, int64_t y, int64_t z)
{
  int r=0;
  if (y > 0) return 1;
  if (z < 0x80000000L) return 2;
  if (x > 50000) r |= 31;
  if (x*x > z) r |= 8;
  if (x*x < y) r |= 16;
  return r;
}

If the above code was run on a machine that simply ignores integer overflow, passing 50001,0,0x80000000L should result in the code returning 31; passing 50000,0,0x80000000L could result in it returning 0, 8, 16, or 24 depending upon how the code handles the comparison operations. The C standard, however, would allow the code to do anything whatsoever in any of those cases; because of that, some compilers might determine that none of the if statements beyond the first two could ever be true in any situation which hadn't invoked Undefined Behavior, and may thus assume that r is always zero. Note that one of the inferences would affect the behavior of a statement which precedes the Undefined Behavior.

One thing I'd really like to see would be a concept of "Implementation Constrained" behavior, which would be something of a cross between Undefined Behavior and Implementation-Defined Behavior: compilers would be required to document all possible consequences of certain constructs which under the old rules would be Undefined Behavior, but--unlike Implementation-Defined behavior--an implementation would not be required to specify one specific thing that would happen; implementations would be allowed to specify that a certain construct may have arbitrary unconstrained consequences (full UB) but would be discouraged from doing so. In the case of something like integer overflow, a reasonable compromise would be to say that the result of an expression that overflows may be a "magic" value which, if explicitly typecast, will yield an arbitrary (and "ordinary") value of the indicated type, but which may otherwise appears to have arbitrarily changing values which may or may not be representable. Compilers would be allowed to assume that the result of an operation will not be a result of overflow, but would refrain from making inferences about the operands. To use a vague analogy, the behavior would be similar to how floating-point would be if explicitly typecasting a NaN could yield any arbitrary non-NaN result.

IMHO, C would greatly benefit from combining the above concept of "implementation-constrained" behaviors with some standard predefined macros which would allow code to test whether an implementation makes any particular promises about its behavior in various situations. Additionally, it would be helpful if there were a standard means by which a section of code could request a particular "dialects" [combination of int size, implementation-constrained behaviors, etc.]. It would be possible to write a compiler for any platform which could, upon request, have promotion rules behave as though int was exactly 32 bits. For example, given code like:

uint64_t l1,l2; uint32_t w1,w2; uint16_t h1,h2;
...
l1+=(h1+h2);
l2+=(w2-w1);

A 16-bit compiler might be fastest if it performed the math on h1 and h2 using 16 bits, and a 64-bit compiler might be fastest if it added to l2 the 64-bit result of subtracting w1 from w2, but if the code was written for a 32-bit system, being able to have compilers for the other two systems generate code which would behave as it had on the 32-bit system would be more helpful than having them generate code which performed some different computation, no matter how much faster the latter code would be.

Unfortunately, there is not at present any standard means by which code can ask for such semantics [a fact which will likely limit the efficiency of 64-bit code in many cases]; the best one can do is probably to expressly document the code's environmental requirements somewhere and hope that whoever is using the code sees them.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • Impl-constrained sounds vaguely similar to unspecified. – Damian Yerrick Jul 31 '19 at 15:07
  • @DamianYerrick: Impl. constrained would be equivalent to Unspecified on implementations that indicate that they will uphold behavioral guarantees, and Undefined on implementations that indicate that they make no such promises. – supercat Jul 31 '19 at 15:16