First, I feel like we need to get some terms straight, at least with respect to C.
From the C2011 online draft:
Undefined behavior - behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements. Possible undefined behavior ranges from ignoring the situation completely with unpredictable
results, to behaving during translation or program execution in a documented manner characteristic of the
environment (with or without the issuance of a diagnostic message), to terminating a translation or
execution (with the issuance of a diagnostic message).
Unspecified behavior - use of an unspecified value, or other behavior where this International Standard provides
two or more possibilities and imposes no further requirements on which is chosen in any
instance. An example of unspecified behavior is the order in which the arguments to a function are
evaluated.
Implementation-defined behavior - unspecified behavior where each implementation documents how the choice is made. An example of implementation-defined behavior is the propagation of the high-order bit
when a signed integer is shifted right.
The key point above is that unspecified behavior means that the language definition provides multiple values or behaviors from which the implementation may choose, and there are no further requirements on how that choice is made. Unspecified behavior becomes implementation-defined behavior when the implementation documents how it makes that choice.
This means that there are restrictions on what may be considered implementation-defined behavior.
The other key point is that undefined does not mean illegal, it only means unpredictable. It means you've voided the warranty, and anything that happens afterwards is not the responsibility of the compiler implementation. One possible outcome of undefined behavior is to work exactly as expected with no nasty side effects. Which, frankly, is the worst possible outcome, because it means as soon as something in the code or environment changes, everything could blow up and you have no idea why (been in that movie a few times).
Now to the question at hand:
I also know that on some architectures ("segmented machine" as I read somewhere), there are good reasons that the behavior is undefined.
And that's why it's undefined everywhere. There are some architectures still in use where different objects can be stored in different memory segments, and any differences in their addresses would be meaningless. There are just so many different memory models and addressing schemes that you cannot hope to define a behavior that works consistently for all of them (or the definition would be so complicated that it would be difficult to implement).
The philosophy behind C is to be maximally portable to as many architectures as possible, and to do that it imposes as few requirements on the implementation as possible. This is why the standard arithmetic types (int
, float
, etc.) are defined by the minimum range of values that they can represent with a minimum precision, not by the number of bits they take up. It's why pointers to different types may have different sizes and alignments.
Adding language that would make some behaviors undefined on this list of architectures vs. unspecified on that list of architectures would be a headache, both for the standards committee and various compiler implementors. It would mean adding a lot of special-case logic to compilers like gcc
, which could make it less reliable as a compiler.