103

If I have:

unsigned int x;
x -= x;

it's clear that x should be zero after this expression, but everywhere I look, they say the behavior of this code is undefined, not merely the value of x (until before the subtraction).

Two questions:

  • Is the behavior of this code indeed undefined?
    (E.g. Might the code crash [or worse] on a compliant system?)

  • If so, why does C say that the behavior is undefined, when it is perfectly clear that x should be zero here?

    i.e. What is the advantage given by not defining the behavior here?

Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
user541686
  • 205,094
  • 128
  • 528
  • 886
  • 3
    possible duplicate of [Why does the C standard leave use of indeterminate variables undefined?](http://stackoverflow.com/questions/3248118/why-does-the-c-standard-leave-use-of-indeterminate-variables-undefined) – jscs Aug 14 '12 at 23:51
  • @W'rkncacnter: If you look at the answer there, it's answering a slightly different question (why C doesn't initialize variables), *not* why the behavior is undefined. – user541686 Aug 14 '12 at 23:52
  • 9
    @W'rkncacnter I disagree with that being a dupe. Regardless of whether what value it takes, the OP expects it to be zero after `x -= x`. The question is *why* accessing uninitialized values at all is UB. – Mysticial Aug 14 '12 at 23:53
  • 6
    It's interesting that the statement x=0; is typically converted to xor x,x in assembly. It's almost the same as what you are trying to do here, but with xor instead of subtraction. – 0xFE Aug 14 '12 at 23:57
  • There's also [What happens to a declared, uninitialized variable in C -- does it have a value](http://stackoverflow.com/questions/1597405/what-happens-to-a-declared-uninitialized-variable-in-c-does-it-have-a-value), whose accepted answer definitely does address UB. – jscs Aug 14 '12 at 23:57
  • 2
    'i.e. What is the advantage given by not defining the behavior here? ' -- I would have thought that the advantage of the standard not listing the infinity of expressions with values that don't depend on one or more variables to be obvious. At the same time, @Paul, such a change to the standard would not make programs and libraries any bigger. – Jim Balter Aug 15 '12 at 00:12
  • Similar: http://stackoverflow.com/questions/25074180/is-aa-or-a-a-undefined-behaviour-if-a-is-not-initialized/25074258#25074258 – M.M Feb 16 '15 at 23:22
  • @MattMcNabb: You should probably link the other one to this one, considering this one came 2 years earlier. – user541686 Feb 16 '15 at 23:34
  • @Mehrdad OK, did a comment link. Both questions have good and established answers so closing as duplicate is probably not appropriate, although perhaps a moderator could do a merge. – M.M Feb 16 '15 at 23:44
  • @MattMcNabb: Yeah we can probably leave them as is, they're not quite duplicates I think. – user541686 Feb 16 '15 at 23:47
  • @JimBalter: Allowing indeterminate values to behave strangely can allow useful optimizations. For example, given `uint16_t foo(void) {uint16_t result; `, followed by various statements, each of which may or may not write result and then `return result;}`, it may be helpful to have the compiler keep `result` in a 32-bit register and then return that. If anything stores a value to result, the compiler will ensure the value stored is 0..65535, but if nothing writes to `result`, keeping the return value within that range would require adding an extra instruction. – supercat Aug 14 '16 at 19:47
  • @supercat One of your typical 4 year late non sequiturs. My comment was specifically about "expressions with values that don't depend on one or more variables" -- in this case, `x - x`. Were the Standard to specify that `uint16_t foo(void) {uint16_t result; result -= result; return result;}` returns 0, this would not make *conformant* programs and libraries bigger. We don't worry about **buggy** code producing larger binaries. We do want the compiler to be able to optimize *conformant* programs by taking advantage of undefined behavior, and the added specification wouldn't change that. – Jim Balter Aug 15 '16 at 01:26
  • This question was discussed on HackerNews, with responses from C experts, at https://news.ycombinator.com/item?id=22867059 – Max Barraclough Apr 18 '20 at 13:12
  • @MaxBarraclough: Wow, thanks a ton for sharing. [This page](http://blog.frama-c.com/index.php?post/2013/03/13/indeterminate-undefined) they linked to was pretty enlightening. So, for anyone else reading this, the tl;dr seems to be that (a) the code is undefined; (b) if you take the address of the source, then it's unclear according to the standard whether it'd be undefined, but (c) compilers treat that as undefined too, so we might as well. – user541686 Apr 18 '20 at 13:41
  • @user541686 I don't think your b) is accurate, see my comments in the HackerNews thread. Also see the comment by msebor, a C expert, which makes no mention of taking the address. – Max Barraclough Apr 18 '20 at 16:40
  • 1
    @MaxBarraclough: I saw his comments; they don't contain any quotes from the standard to back them up, whereas people here have been quoting the standard. Note that another similar C expert there actually misremembered what the standard said about type-punning, and someone had to correct him. Did you see [this comment](https://stackoverflow.com/questions/11962457?noredirect=1#comment56579444_11965368) below? It said this question was on the C committee's mailing list in 2015 and there was disagreement between the spec and their intentions. I think my summary captured it pretty darn accurately.. – user541686 Apr 18 '20 at 22:38
  • 1
    Does this answer your question? [Why does the C standard leave use of indeterminate variables undefined?](https://stackoverflow.com/questions/3248118/why-does-the-c-standard-leave-use-of-indeterminate-variables-undefined) – NAND May 16 '20 at 00:20

7 Answers7

106

Yes this behavior is undefined but for different reasons than most people are aware of.

First, using an unitialized value is by itself not undefined behavior, but the value is simply indeterminate. Accessing this then is UB if the value happens to be a trap representation for the type. Unsigned types rarely have trap representations, so you would be relatively safe on that side.

What makes the behavior undefined is an additional property of your variable, namely that it "could have been declared with register" that is its address is never taken. Such variables are treated specially because there are architectures that have real CPU registers that have a sort of extra state that is "uninitialized" and that doesn't correspond to a value in the type domain.

Edit: The relevant phrase of the standard is 6.3.2.1p2:

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

And to make it clearer, the following code is legal under all circumstances:

unsigned char a, b;
memcpy(&a, &b, 1);
a -= a;
  • Here the addresses of a and b are taken, so their value is just indeterminate.
  • Since unsigned char never has trap representations that indeterminate value is just unspecified, any value of unsigned char could happen.
  • At the end a must hold the value 0.

Edit2: a and b have unspecified values:

3.19.3 unspecified value
valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance

Edit3: Some of this will be clarified in C23, where the term "indeterminate value" is replaced by the term "indeterminate representation" and the term "trap representation" is replaced by "non-value representation". Note also that all of this is different between C and C++, which has a different object model.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • 1
    Regarding your last point: I see why it's *reasonable* that `a` becomes `0` but how is that *guaranteed* by the standard? Before the assignment the value of `a` is indeterminate. Doesn't that include that two accesses might return different values? Or does the C standard guarantee that an indeterminate value stays the same indeterminate value between two accesses? – Nikolai Ruhe Aug 24 '12 at 20:36
  • @NikolaiRuhe. No it is not indeterminate, it is unspecified. Basically this means that the standard doesn't impose any particular value but it has a valid value in that range, usually this corresponds just to the bit pattern that is found at that address. This value is subtracted from itself, so result is `0`. – Jens Gustedt Aug 24 '12 at 21:06
  • 1
    The standard requires writing to a variable must cause all of the `unsigned char` constituent parts to be written with non-trap values. Does it require that variables which are not written must be non-trap forms? I would think a compiler running on a machine with parity-checked memory (e.g. the original IBM PC) should be allowed to fill undefined memory with trap vales if it were so inclined, such that any fetch would trigger a trap. – supercat Jan 12 '13 at 16:23
  • @supercat, a particular bit pattern may constitute a trap value only for a particular type, and be a regular value when interpreted as another type. So yes, under such architecture that you describe the individual bytes wouldn't be traps, and the composition of all these bytes when interpreted as `int` could be a trap. If with "trigger a trap" you mean "raise an implementation defined signal", then yes, an implementation could implement `int` like that. – Jens Gustedt Jan 12 '13 at 19:21
  • 6
    Perhaps I'm missing something, but it seems to me that `unsigned`s can sure have trap representations. Can you point to the part of the standard that says so? I see in §6.2.6.2/1 the following: "For unsigned integer types other than **unsigned char**, the bits of the objectrepresentation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). ... this shall beknown as the value representation. The values of any padding bits are unspecified. ⁴⁴⁾" with the comment saying: "⁴⁴⁾ Some combinations of padding bits might generate trap representations". – conio Dec 18 '13 at 01:06
  • 6
    Continuing the comment: "Some combinations of padding bits might generate trap representations, for example, if one padding bit is a parity bit. Regardless, no arithmetic operation on valid values can generate a trap representation other than as part of an exceptional condition such as an overflow, and this cannot occur with unsigned types." - That's great **once** we have a valid value to work with, but the indeterminate value **might** be a trap representation before being initialized (e.g. parity bit set wrong). – conio Dec 18 '13 at 01:09
  • 5
    @conio You're correct for all types other than `unsigned char`, but this answer is using `unsigned char`. Note though: a strictly conforming program can calculate `sizeof(unsigned) * CHAR_BIT` and determine, based on `UINT_MAX`, that particular implementations cannot possibly have trap representations for `unsigned`. After that program has made that determination, it can then proceed to do exactly what this answer does with `unsigned char`. –  Nov 27 '14 at 11:38
  • Can you explain how is that memcpy defined in regards with the first standard rule (6.3.2.1p2) you posted. I think your reasoning is not correct because you think that if an automatic variable has its address actually taken then it is exempt from the rule. My reasoning; even it's address is taken it still could have been declared with register, even if it wasn't in this case, therefore the behavior is undefined. The rule applies to any automatic object: *that could have been declared with the register storage class*. That doesn't mean it has to be. (I hope my comment was clear.) Thoughts? – this Nov 17 '15 at 11:44
  • 5
    @JensGustedt: Isn't the `memcpy` a distraction, i.e. wouldn't your example still apply if it were replaced by `*&a = *&b;`. – R.. GitHub STOP HELPING ICE Dec 21 '15 at 21:28
  • 6
    @R.. I am not sure anymore. There is an ongoing discussion on the mailing list of the C committee, and it seems that all of this is a big mess, namely a large gap between what is (or has been) intended behavior and what is actually written up. What is clear though, is that accessing the memory as `unsigned char` and thus `memcpy` helps, the one for `*&` is less clear. I'll report once this settles down. – Jens Gustedt Dec 22 '15 at 15:27
  • Just to add to the discussion: https://blogs.msdn.microsoft.com/oldnewthing/20040119-00/?p=41003. As far as I understand, UB trumps all other guarantees, including the guarantee that `unsigned char` has no trap representation. – Vlad Jan 21 '16 at 17:08
  • @Vlad, I don't have the impression that this has much to say. First it seems to be mostly about C++, and that is certainly different, here. And then this seem to be MS compilers, no? As I said, the intent expressed by the C committee seems to be that access as bytes (any of the character types) always has defined behavior. – Jens Gustedt Jan 22 '16 at 07:49
  • @Jens: Okay. There is a discussion about NaT [here](http://stackoverflow.com/q/26451954/276994), [this comment](http://stackoverflow.com/q/26451954/#comment41558004_26451954) suggests that NaT is not a trap representation. Given your claim "Accessing [indeterminate value] then is UB if the value happens to be a trap representation for the type" (did you mean "if and only if"?), there seems to be a contradiction. [There is a [linked defect report](http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_338.htm), suggesting changes about `unsigned char` guarantees.] – Vlad Jan 22 '16 at 14:55
  • @Jens: So all of this might still be related to the discussed topic. – Vlad Jan 22 '16 at 14:56
  • 2
    The NaT is a state of a hardware register. I think this is the origin of this idea of an object *"could have been declared with the register storage class"*. As soon as you access the data through memory as bytes, it can't have the NaT state, that's the whole idea. – Jens Gustedt Jan 25 '16 at 07:47
  • @JensGustedt: Do you happen to have a link to the C committee mailing list email that you mentioned? (if it's publicly visible) – user541686 Jan 25 '16 at 21:16
  • @Vlad: There are times when useful optimizations could be achieved by allowing a read of an Indeterminate value to behave in a fashion contrary to any defined behavior for values of that type, even for types like `uint16_t` where every possible bit pattern for the underlying storage would have defined behavior. If such things aren't trap representations, what else could they be? – supercat Aug 13 '16 at 19:16
  • 1
    @supercat: Well, depends on the exact Standard wording. If the Standard requires that every `uint16_t` value is either a valid bit combination or a trap representation _even in the presence of undefined behavior_, than you are right. If however UB voids all the other requirements, then it can be anything including a nasal demon instance. – Vlad Aug 13 '16 at 21:26
  • @Vlad: If behavior would be defined unless an lvalue is read, but reading the lvalue might behavior inconsistent with its type (e.g. a `uint16_t` holding 65536), that would imply that the act of reading the lvalue would trigger UB. To me, that would in turn suggest that the lvalue held a trap representation. – supercat Aug 13 '16 at 21:45
  • @this "even it's (sic) address is taken it still could have been declared with register" -- no, it couldn't. Just try reading the part of the Standard quoted, which reflects the constraints on the `&` operator: "could have been declared with the register storage class (never had its address taken)" – Jim Balter Aug 15 '16 at 01:44
  • 2
    In at least the draft of the C11 standard, Annex J.2 includes "the value of an object with automatic storage duration is used while it is indeterminate" in the list of undefined behavior. Now this annex isn't normative, and it's not clear that the standards body agrees at the cited sections, so maybe it is claiming too much in J.2. Is that your position? Because I read J.2 as saying that even the `memcpy` example would have UB. – EvanED Oct 12 '16 at 17:26
  • 3
    After reading more, the story gets even more complicated. The C Committee Response to [Defect Report #451](http://www.open-std.org/Jtc1/sc22/WG14/www/docs/dr_451.htm) (and #260, linked there) indicate that indeterminite values are allowed to appear to change without direct actions of the program. That and other statements in the committee response would, I'd imagine, mean that `a -= a` would still result in an indeterminite value even if it's not true UB. Do you disagree, and think I'm off base there? – EvanED Oct 12 '16 at 18:17
  • In your example the result will be unspecified, and not 0. See: http://www.open-std.org/Jtc1/sc22/WG14/www/docs/dr_451.htm Note that this also applies for unspecified values. – 2501 Nov 14 '16 at 10:09
  • @EvanED: What is needed to allow optimization without losing semantics is a recognition of non-deterministic values and ways of forcing them partially or fully determinate. I think it unfortunate that while some people think that if x is indeterminate, x & 15 should be fully determinate, others think it should be fully indeterminate. The former would impede optimizations more than necessary, while the latter would force programmers to clutter their source with code to block optimizations more than necessary. The solution IMHO would be to say... – supercat Nov 23 '16 at 16:50
  • ...that a variable of type X holds *at least* one value of type X, but might hold more; if x and y are both of type uint32_t, then (x & y) would be allowed to yield any non-empty subset of the values formed by combinations of possible values for x and y. If x and y start out fully indeterminate, then after "xx = x 3;" xx would hold one or more of {0,1,2,3} and after "yy = y & 10;", yy would hold one or more of {0,2,8,10}. The expression xx+yy would then yield one or more of {0,1,2,3,4,5,8,9,10,11,12,13}. While it might seem hard for compilers to track that... – supercat Nov 23 '16 at 16:54
  • ...the main usefulness of indeterminate values would be to allow for compilers to use symbolic substitution to reorder operations, so that if e.g. a compiler which is given something "z=xx+yy;" followed sometime later by "w=z;" and later still by another "w=z;" it might replace the latter assignments with "w=(x & 3)+(y & 10);". If "x" or "y" changes in unexpected fashion, that might cause the two assignments to store different values, but it wouldn't cause any value outside the aforementioned set. – supercat Nov 23 '16 at 16:59
  • @2501 in fact, under DR 451, `a -= a` results in `a` still being indeterminate (not merely unspecified): under that resultion, the apparent value is unspecified at each observation (aka. "wobbly") – M.M Mar 21 '17 at 08:47
  • @M.M The report says this: *From 3.19.2 it follows that if a type has no trap values, then indeterminate and unspecified values are the same. And in 3.19.3, it is stated explicitly that an unspecified value is chosen. Which implies that the value - after having been chosen - cannot change anymore.* This is wrong. An unspecified value can clearly can change at any 'observation': *3.19.3 1 unspecified value valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance* – 2501 Mar 21 '17 at 09:05
  • @M.M: The more interesting question is the effect of `a=a; a -= a;`. If the second statement were performed in isolation, the two reads of `a` might potentially yield different values, since even after the first read nothing would have "set" the value of `a`. If a read of an Indeterminate value is guaranteed to yield some particular arbitrary value, then after `a=a;`, `a` should hold some possibly-unknown but no longer Unspecified value, so the subtract should yield 0. Unfortunately, some compilers don't recognize any way of forcing the compiler to turn a "wobbly" value into a usable one. – supercat May 10 '17 at 22:43
  • 1
    @supercat you can turn a wobbly-valued variable into a usable one by assigning a non-wobbly value to it . There's very little use case for wanting non-wobbly garbage – M.M May 10 '17 at 22:48
  • @M.M: For various sparse-array and hash table algorithms do a key lookup via `uint32_t index = map[key]; if (index < numItems && values[index]=key) ItemFound(...) else ItemNotFound(,,.);` If `key` isn't in the table, `map[key]` could return *any* non-wobbly `uint32_t` value and code would correctly report that it's not found. If `index` gets assigned a wobbly value, though, there's no way to prevent an out-of-bounds array fetch. – supercat May 11 '17 at 04:27
  • @supercat that's why I said "very little" instead of "none at all". And on modern OSs there is no penalty to making a large zero-initialized allocation. – M.M May 11 '17 at 04:41
  • @M.M: C is often used in freestanding implementations where there is no "modern OS" [or any OS for that matter], or where the purpose of the compiled code is to *be* the OS. – supercat May 11 '17 at 12:57
  • @supercat typically, embedded devices would not require a sparse array so large that the initialization time is a measurable problem. Not saying *never* but it would be a very rare use case. – M.M May 12 '17 at 01:48
  • @M.M: I said *or where the purpose of the compiled code is to **be** the OS*, which could extend up to some rather large systems. In cases where an "optimizing" compiler would require a programmer to force the computer to do otherwise-unnecessary work, the value of any potential optimizations may be negated by the unnecessary work. A compiler that could achieve 90% of the optimizations while requiring 0% of the needless work would allow for more efficient code. – supercat May 26 '17 at 23:16
  • But shouldn't this behavior occur only for auto variables? – AlphaGoku Feb 18 '19 at 05:59
  • @AlphaGoku: Yes, but a function might reasonably create structures of automatic duration without populating all the members. There are purposes where the benefits of being able to statically prove that all struct members will be written would outweigh the performance benefits of omitting writes that wouldn't be necessary at the machine level for a program to meet requirements, but there are others where the performance benefits of eliminating needless writes would be worth more. The Standard is intended to let implementations choose whichever approach would best serve their customers, not... – supercat Aug 04 '21 at 19:11
  • ...to imply any judgment about which approach would be more useful in any particular situation. – supercat Aug 04 '21 at 19:11
  • I don't quite understand the sudden "that could have been declared with the register storage class" in the context of the code shown; it seems to clearly just bee automatic storage that is not register. – anonymouscoward Mar 04 '22 at 13:13
  • Oh wait, now I realize, but I don't like it – anonymouscoward Mar 04 '22 at 13:32
  • 1
    The "does not have character type" implies that `uint32_t` (for example) can have trap representation (if width of character type != width of `uint32_t`). However, `uint32_t` is guaranteed to have "no padding bits, and a two’s complement representation" (C11, 7.20.1.1). Then how `uint32_t` can have trap representation? Any examples? – pmor Aug 01 '22 at 17:59
  • You claimed "using an unitialized value is by itself not undefined behavior" but the C Standard says that using an indeterminate value of a variable with automatic storage duration (which this undoubtedly is) always classifies as undefined behavior. See section J.2 "Undefined Behavior". – Ben Voigt Dec 08 '22 at 21:15
  • 2
    @BenVoigt, Annex J is not normative. And in fact you should read that one as "in some cases ..." – Jens Gustedt Dec 09 '22 at 12:15
  • @JensGustedt: Indeed, in C89 the behavior would have been defined for all types which don't have trap representations, though even in the C89 days many implementations would not always have behaved in a manner consistent with storage holding an arbitrary bit pattern. Unfortunately, the Standard has no vocabulary to characterize a behavior which is less specific than behaving as though an object holds an unspecified bit pattern, but is more specific than "anything can happen" Undefined Behavior. Treating partially-initialized aggregates with loose semantics could allow optimizations... – supercat Dec 12 '22 at 17:06
  • ...that would not be possible if all objects had to be regarded as holding some (possibly initially unknown and arbitrary) bit pattern, but only if programmers who only needed such loose semantics could leave objects partially uninitialized. Given `struct foo t; extern struct foo x,y;`, along with code that partially initializes `t`, saying that `x=t; y=t;` may leave portions of `x` and `y` that weren't set in `t` holding *independent* Unspecified bit patterns would seem better than having to either require that `x` and `y` match, or that such action would trigger "anything can happen" UB. – supercat Dec 12 '22 at 17:21
  • @supercat Can you address my comment above about `uint32_t`? Note: I agree with Jens that "the NaT is a state of a hardware register", the NaT is not contained in the object ("region of data storage in the execution environment, the contents of which can represent values"). – pmor Mar 03 '23 at 08:54
  • @pmor: The authors of the Standard made no particular effort to ensure that no constructs which should have a defined meaning were characterized as UB, but did seek to avoid saying more than they had to about the behavior of non-portable programs. There shouldn't be anything special about character types beyond the fact that they're the only types that are *guaranteed* to (1) exist, (2) have no trap representations, and (3) have no alignment requirements. The proposition that that other types exist with the first two characteristics would represent a non-portable assumption, though one... – supercat Mar 03 '23 at 15:41
  • ...which would in practice have to be true on any platform where type `uint32_t` exists. I think NaT is a read herring, since even in C89 days there have been implementations where e.g. `unsigned short x; unsigned long y; ...code that doesn't affect x ... y=x;` could set `y` to a value beyond the range of an `unsigned short`, and I wouldn't be surprised if on some of that could even happen with `if (x < 65536) y=x;`. – supercat Mar 03 '23 at 15:47
26

The C standard gives compilers a lot of latitude to perform optimizations. The consequences of these optimizations can be surprising if you assume a naive model of programs where uninitialized memory is set to some random bit pattern and all operations are carried out in the order they are written.

Note: the following examples are only valid because x never has its address taken, so it is “register-like”. They would also be valid if the type of x had trap representations; this is rarely the case for unsigned types (it requires “wasting” at least one bit of storage, and must be documented), and impossible for unsigned char. If x had a signed type, then the implementation could define the bit pattern that is not a number between -(2n-1-1) and 2n-1-1 as a trap representation. See Jens Gustedt's answer.

Compilers try to assign registers to variables, because registers are faster than memory. Since the program may use more variables than the processor has registers, compilers perform register allocation, which leads to different variables using the same register at different times. Consider the program fragment

unsigned x, y, z;   /* 0 */
y = 0;              /* 1 */
z = 4;              /* 2 */
x = - x;            /* 3 */
y = y + z;          /* 4 */
x = y + 1;          /* 5 */

When line 3 is evaluated, x is not initialized yet, therefore (reasons the compiler) line 3 must be some kind of fluke that can't happen due to other conditions that the compiler wasn't smart enough to figure out. Since z is not used after line 4, and x is not used before line 5, the same register can be used for both variables. So this little program is compiled to the following operations on registers:

r1 = 0;
r0 = 4;
r0 = - r0;
r1 += r0;
r0 = r1;

The final value of x is the final value of r0, and the final value of y is the final value of r1. These values are x = -3 and y = -4, and not 5 and 4 as would happen if x had been properly initialized.

For a more elaborate example, consider the following code fragment:

unsigned i, x;
for (i = 0; i < 10; i++) {
    x = (condition() ? some_value() : -x);
}

Suppose that the compiler detects that condition has no side effect. Since condition does not modify x, the compiler knows that the first run through the loop cannot possibly be accessing x since it is not initialized yet. Therefore the first execution of the loop body is equivalent to x = some_value(), there's no need to test the condition. The compiler may compile this code as if you'd written

unsigned i, x;
i = 0; /* if some_value() uses i */
x = some_value();
for (i = 1; i < 10; i++) {
    x = (condition() ? some_value() : -x);
}

The way this may be modeled inside the compiler is to consider that any value depending on x has whatever value is convenient as long as x is uninitialized. Because the behavior when an uninitialized variable is undefined, rather than the variable merely having an unspecified value, the compiler does not need to keep track of any special mathematical relationship between whatever-is-convenient values. Thus the compiler may analyze the code above in this way:

  • during the first loop iteration, x is uninitialized by the time -x is evaluated.
  • -x has undefined behavior, so its value is whatever-is-convenient.
  • The optimization rule condition ? value : value applies, so this code can be simplified to condition; value.

When confronted with the code in your question, this same compiler analyzes that when x = - x is evaluated, the value of -x is whatever-is-convenient. So the assignment can be optimized away.

I haven't looked for an example of a compiler that behaves as described above, but it's the kind of optimizations good compilers try to do. I wouldn't be surprised to encounter one. Here's a less plausible example of a compiler with which your program crashes. (It may not be that implausible if you compile your program in some kind of advanced debugging mode.)

This hypothetical compiler maps every variable in a different memory page and sets up page attributes so that reading from an uninitialized variable causes a processor trap that invokes a debugger. Any assignment to a variable first makes sure that its memory page is mapped normally. This compiler doesn't try to perform any advanced optimization — it's in a debugging mode, intended to easily locate bugs such as uninitialized variables. When x = - x is evaluated, the right-hand side causes a trap and the debugger fires up.

Community
  • 1
  • 1
Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
  • +1 Nice explanation, the standard is taking special care of that situation. For a continuation of that story see my answer below. (too long to have as a comment). – Jens Gustedt Aug 15 '12 at 07:15
  • @JensGustedt Oh, your answer makes a very important point that I (and others) missed: unless the type has trap values, which for an unsigned type requires “wasting” at least one bit, `x` has an uninitialized value but the behavior on accessing would be defined if x didn't have register-like behavior. – Gilles 'SO- stop being evil' Aug 15 '12 at 12:20
  • @Gilles: at least clang makes the kind of optimizations you mentioned: [(1)](http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html), [(2)](http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html), [(3)](http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html). – Vlad Aug 18 '13 at 00:03
  • 1
    What practical advantage is there to having clang process things in that fashion? If downstream code never uses the value of `x`, then all operations on it could be omitted whether its value had been defined or not. If code following e.g. `if (volatile1) x=volatile2; ... x = (x+volatile3) & 255;` would be equally happy with any value 0-255 that `x` might contain in the case where `volatile1` had yielded zero, I would think an implementation that would allow the programmer to omit an unnecessary write to `x` should be regarded as higher quality than one which would behave... – supercat May 29 '18 at 20:06
  • ...in totally unpredictable fashion in that case. An implementation that would reliably raise an implementation-defined trap in that case might, for certain purposes, be regarded as being of higher quality yet, but behaving totally unpredictably seems to me like the lowest-quality form of behavior for pretty much any purpose. – supercat May 29 '18 at 20:08
17

Yes, the program might crash. There might, for example, be trap representations (specific bit patterns which cannot be handled) which might cause a CPU interrupt, which unhandled could crash the program.

(6.2.6.1 on a late C11 draft says) Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.

(This explanation only applies on platforms where unsigned int can have trap representations, which is rare on real world systems; see comments for details and referrals to alternate and perhaps more common causes which lead to the standard's current wording.)

eq-
  • 9,986
  • 36
  • 38
  • Can you give at least one example of a bit pattern for an integer that can drive CPU crazy? –  Aug 14 '12 at 23:52
  • 3
    @VladLazarenko: This is about C, not particular CPUs. Anyone can trivially design a CPU that has bit patterns for integers that drive it crazy. Consider a CPU that has a "crazy bit" in its registers. – David Schwartz Aug 14 '12 at 23:53
  • @VladLazarenko, depends on the CPU. There are none for integers on x86. – eq- Aug 14 '12 at 23:53
  • 2
    So can I say, then, that the behavior is well defined in case of integers and x86? –  Aug 14 '12 at 23:55
  • @VladLazarenko: Well, if it's undefined, then no -- unless your compiler specifically *says* it's defined, you can't assume it's defined, because you can't assume the compiler will emit the instructions you expect (it will likely avoid doing so, for optimization). – user541686 Aug 14 '12 at 23:56
  • 3
    Well, theoretically you could have a compiler which decided to only use 28-bits integers (on x86) and add specific code to handle each addition, multiplication (an so forth) and ensure that these 4 bits go unused (or emit a SIGSEGV otherwise). An uninitalized value could cause this. – eq- Aug 14 '12 at 23:58
  • 4
    I hate when someone insults everyone else because that someone doesn't understand the issue. Whether the behavior is undefined is entirely a matter of what the standard says. Oh, and there is nothing at all practical about eq's scenario ... it's entirely contrived. – Jim Balter Aug 15 '12 at 00:17
  • 2
    P.S. David Schwartz's idea under the other answer is a more practical idea and suggests another ... suppose that physical memory isn't allocated to virtual addresses until initialized or written to; then accessing an uninitialized variable could result in an access violation. – Jim Balter Aug 15 '12 at 00:28
  • 8
    @Vlad Lazarenko: Itanium CPUs have a NaT (Not a Thing) flag for each integer register. The NaT Flag is used to control speculative execution and may linger in registers which aren't properly initialized before usage. Reading from such a register with a NaT bit set yields an exception. See http://blogs.msdn.com/b/oldnewthing/archive/2004/01/19/60162.aspx – Nordic Mainframe Aug 15 '12 at 00:55
  • This explanation is insufficient, it only states half of the story for the case that the value happens to be a trap representation. It still is UB by the standard, but for another reason. Please see my answer. – Jens Gustedt Aug 15 '12 at 07:17
  • 1
    @eq- you are just not given the good reasons. In this case the UB has nothing to do with trap representations. It comes from the fact that the address of the variable is never taken. So I take it back, you are not telling half the story, you are telling the wrong story. – Jens Gustedt Aug 15 '12 at 11:38
  • @JensGustedt, too much real-world thinking spoils good theoretical issues :( – eq- Aug 15 '12 at 11:48
  • @eq- probably my English isn't good enough to capture what you are trying to say. This is not a theoretical issue. Unspecific values may be used under certain circumstances, in my answer I have given valid code for that. – Jens Gustedt Aug 15 '12 at 12:23
  • @JensGustedt, my example, it seems, is more of a theoretical issue for all but theoretical implementations of `unsigned` integer types. – eq- Aug 15 '12 at 12:30
  • This answer is incorrect where it states “So yes, the behavior is indeed undefined.” As the answers of myself and Jens Gustedt show (with citations from the C standard, which this answer does not provide), taking the value of an uninitialized object does not by itself cause uninitialized behavior. In C 1999, undefined behavior only occurs if certain other conditions are met, and those conditions are not met for integer types on most common systems. See Jens Gustedt’s answer for the C 2011 situation. – Eric Postpischil Aug 15 '12 at 13:07
  • @EricPostpischil: It is not uncommon for uninitialized variables to behave as though they have values outside the range of their type. IMHO, there should be a category of behavior to cover such things, which would--unlike Implementation-Defined behavior--not require an implementation to define what would happen in detail, but--unlike UB--would not grant compilers unlimited latitude either. – supercat Apr 14 '16 at 05:24
13

(This answer addresses C 1999. For C 2011, see Jens Gustedt’s answer.)

The C standard does not say that using the value of an object of automatic storage duration that is not initialized is undefined behavior. The C 1999 standard says, in 6.7.8 10, “If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.” (This paragraph goes on to define how static objects are initialized, so the only uninitialized objects we are concerned about are automatic objects.)

3.17.2 defines “indeterminate value” as “either an unspecified value or a trap representation”. 3.17.3 defines “unspecified value” as “valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance”.

So, if the uninitialized unsigned int x has an unspecified value, then x -= x must produce zero. That leaves the question of whether it may be a trap representation. Accessing a trap value does cause undefined behavior, per 6.2.6.1 5.

Some types of objects may have trap representations, such as the signaling NaNs of floating-point numbers. But unsigned integers are special. Per 6.2.6.2, each of the N value bits of an unsigned int represents a power of 2, and each combination of the value bits represents one of the values from 0 to 2N-1. So unsigned integers can have trap representations only due to some values in their padding bits (such as a parity bit).

If, on your target platform, an unsigned int has no padding bits, then an uninitialized unsigned int cannot have a trap representation, and using its value cannot cause undefined behavior.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • If `x` has a trap representation, then `x -= x` might trap, right? Still, +1 for pointing out unsigned integers with no extra bits must have defined behavior -- it's clearly the opposite of the other answers and (according to the quote) it seems to be what the standard implies. – user541686 Aug 15 '12 at 01:11
  • Yes, if the type of `x` has a trap representation, then `x -= x` might trap. Even simply `x` used as a value might trap. (It is safe to use `x` as an lvalue; writing into an object will not be affected by a trap representation that is in it.) – Eric Postpischil Aug 15 '12 at 01:33
  • unsigned types rarely have a trap representation – Jens Gustedt Aug 15 '12 at 07:08
  • 1
    Quoting [Raymond Chen](https://blogs.msdn.microsoft.com/oldnewthing/20040119-00/?p=41003), "On the ia64, each 64-bit register is actually 65 bits. The extra bit is called “NaT” which stands for “not a thing”. The bit is set when the register does not contain a valid value. Think of it as the integer version of the floating point NaN. ... if you have a register whose value is NaT and you so much as breathe on it the wrong way (for example, try to save its value to memory), the processor will raise a STATUS_REG_NAT_CONSUMPTION exception". I.e., a trap bit can be completely outside the value. – Cheers and hth. - Alf Apr 06 '16 at 20:21
  • **−1** The statement "If, on your target platform, an unsigned int has no padding bits, then an uninitialized unsigned int cannot have a trap representation, and using its value cannot cause undefined behavior." fails to consider schemes like the x64 NaT bits. – Cheers and hth. - Alf Apr 06 '16 at 20:26
  • @Cheersandhth.-Alf: Even on conventional 32-bit machines, it would not be unusual for an uninitialized variable of type uint16_t to have a value outside the range 0..65535, and for a function of return type uint16_t that returns that variable to pass its value through to the caller without masking. – supercat May 09 '16 at 20:12
  • @supercat: `uint16_t` (from ``) is an exact width type. And the C++ standard only permits tree possible value encodings, which for 16 bits cannot produce "a value outside the range 0..65535", which you claim is "not unusual". I.e. you're just wrong about that. The problem isn't exceeding the value range, and in practice not even trap representations, but possible additional information about the value, or rather, about the lack of a specified value. – Cheers and hth. - Alf May 09 '16 at 22:52
  • @Cheersandhth.-Alf: I've seen a number of compilers, including gcc's ARM compilers, generate code where registers allocated to uninitialized variables can hold arbitrary values which need not fit the variables' range. E.g. ARM gcc 4.8.2 given `uint16_t foo(uint32_t x, uint32_t y, uint32_t z) { uint16_t q; if (x) q=x; return q; }` will generate code that, if invoked from outside code, will return all 32 bits of z if x is zero. – supercat May 09 '16 at 23:12
  • @Cheersandhth.-Alf: If use of such variables is UB, such code is conforming. If it's not, such code might or might not be conforming, but it's been commonplace behavior for a long time and it allows more efficient code than would otherwise be possible [though in the above case gcc generates needlessly-inefficient code]. – supercat May 09 '16 at 23:15
  • @supercat: I see what you mean, that bits outside the variable can be affected. And if e.g. the result of `foo()` is converted to 32 bits under an assumption that the higher bits of its 32-bit location are zero, then oops. So it's a real problem that I didn't think of. – Cheers and hth. - Alf May 09 '16 at 23:57
  • @Cheersandhth.-Alf: I think the Standard regards use of Indeterminate Value as UB because that's easier than trying to describe everything that can happen, but I think that's unfortunate because there are many cases where code "passes through" values that may or may not be meaningful to recipients that may or may not use them (but who won't use them if they're not meaningful), and making any rvalue conversion of Indeterminate Values invoke Undefined Behavior makes it necessary to add code to ensure that Indeterminate Values can't get passed through. – supercat May 10 '16 at 14:29
  • @Cheersandhth.-Alf: I would like to see the Standard recognize the concept of storage locations holding a non-deterministic union of values, such that operations that must yield a definite result (e.g. an "if" test) can behave as though the storage location held any value it might hold, and other operations (like "+") can yield a non-deterministic union of all values that could have been yielded by source operands. – supercat May 10 '16 at 14:31
12

For any variable of any type, which is not initialized or for other reasons holds an indeterminate value, the following applies for code reading that value:

  • In case the variable has automatic storage duration and does not have its address taken, the code always invokes undefined behavior [1].
  • Otherwise, in case the system supports trap representations for the given variable type, the code always invokes undefined behavior [2].
  • Otherwise if there are no trap representations, the variable takes an unspecified value. There is no guarantee that this unspecified value is consistent each time the variable is read. However, it is guaranteed not to be a trap representation and it is therefore guaranteed not to invoke undefined behavior [3].

    The value can then be safely used without causing a program crash, although such code is not portable to systems with trap representations.


[1]: C11 6.3.2.1:

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

[2]: C11 6.2.6.1:

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.

[3] C11:

3.19.2
indeterminate value
either an unspecified value or a trap representation

3.19.3
unspecified value
valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance
NOTE An unspecified value cannot be a trap representation.

3.19.4
trap representation
an object representation that need not represent a value of the object type

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • I would argue this resolves to "It is always undefined behavior" as the C abstract machine -can- have trap representations. Just because your implementation does not use them does not make the code defined. In fact a strict reading would not even insist the trap representations have to be in hardware from what I cant tell I don't see why a compiler could not decide a specific bit pattern is a trap, check for it every time the variable is read and invoke UB. – Vality Mar 08 '17 at 19:32
  • Note that possibly `unsigned char` is exempt from this for reasons mentioned above. – Vality Mar 08 '17 at 19:40
  • 3
    @Vality In the real world, 99.9999% of all computers are two's complement CPUs without trap representations. Therefore no trap representation is the norm and discussing the behavior on such real-world computers is highly relevant. To assume that wildly exotic computers is the norm isn't helpful. Trap representations in the real world are so rare that the presence of the term trap representation in the standard is mostly to be regarded as a standard defect inherited from the 1980s. As is support for one's complement and sign & magnitude computers. – Lundin Mar 09 '17 at 07:37
  • 3
    By the way, this is an excellent reason why `stdint.h` should always be used instead of the native types of C. Because `stdint.h` enforces 2's complement and no padding bits. In other words, the `stdint.h` types aren't allowed to be full of crap. – Lundin Mar 09 '17 at 07:40
  • 2
    Again the committee response to the defect report says that: "The answer to question 2 is that any operation performed on indeterminate values will have an indeterminate value as a result." and "The answer to question 3 is that library functions will exhibit undefined behavior when used on indeterminate values." – Antti Haapala -- Слава Україні Oct 02 '17 at 11:35
  • 2
  • 1
    @AnttiHaapala Yes I know of that DR. It doesn't contradict this answer. You may get an indeterminate value when reading an uninitialized memory location and it is not necessarily the same value every time. But that is _unspecified_ behavior, not _undefined_ behavior. – Lundin Oct 02 '17 at 11:39
  • Detail on [``](https://stackoverflow.com/questions/11962457/why-is-using-an-uninitialized-variable-undefined-behavior/40674888#comment72501610_40674888). The optional _Exact-width integer types_ are no padding bits, and a two’s complement representation. The header also has required _Minimum-width integer types_ and _Fastest minimum-width integer types_ are not specified to be no padding bits, and a two’s complement representation. – chux - Reinstate Monica Dec 19 '17 at 19:02
  • @AnttiHaapala: The authors of the Standard expect that people seeking to produce quality implementations will try to support behavioral guarantees beyond those mandated by the Standard in cases where the benefit to users would exceed the cost. For many implementations' intended purposes, it would be useful, and cost almost nothing, to guarantee that when the bytes of an object hold Indeterminate Values, every read of the object will behave as though those bytes at worst held a (possibly different) Unspecified Value. That may not be true of all implementations, however. – supercat Aug 29 '18 at 18:55
  • @AnttiHaapala: The DR says that implementations aren't *required* to offer such guarantees, which by default makes the question of whether to support them a Quality of Implementation issue. Neither implementations for purposes which would be incompatible with such guarantees, nor garbage-quality-but-conforming implementations, should be expected to uphold such guarantees, but the Standard is silent on the issue of whether general-purpose implementations that don't should be recognized as being of inferior quality. I think they should, but that's a matter of judgment. – supercat Aug 29 '18 at 19:09
  • https://gcc.godbolt.org/z/84f76PnWY I do not think that “unspecified behavior” applies to this example. You might as well call it undefined behavior, it would be shorter than trying to make a sentence to explain this. (The GCC developers are aware of this example and say that their compilation of it i according to the spirit of the standard, something something wobbly values that are still not described in C23.) – Pascal Cuoq Nov 14 '22 at 07:56
  • @PascalCuoq Sure it does, that example is making assumptions regarding the value of a variable with indeterminate value. The compiler is free to always assign the very same value to it, in case it fancies, and optimise accordingly. – Lundin Nov 14 '22 at 08:02
  • @Lundin This sounds like the description of another example. In my example, when `f` is applied to `0`, an `unsigned char` variable (the `unsigned char` type never has trap representations) has a value greater than 500. Or the word “value” is meaningless, to the point that one might as well save words and call the situation UB. – Pascal Cuoq Nov 14 '22 at 12:11
  • @PascalCuoq What I meant is that _the compiler_ is allowed to assume that the contents of the `unsigned char` is always a certain fixed, unspecified value, known to the _compiler_ but not to the _programmer_. This is the very definition of an unspecified value. Unspecified behavior means just that: the compiler is allowed to implement deterministic but perhaps surprising behavior if relied upon by the programmer, and it need not document how. Notably your example yields the same machine code for `i>100` too, so this could as well be a gcc bug. clang generates different code. – Lundin Nov 14 '22 at 12:27
  • @PascalCuoq What's important here from the "language lawyer" perspective is that nothing in the C standard says that accessing an indeterminate value results in UB, given the premises listed in this answer. What certain compilers do and don't is a conformance and/or quality of implementation question for that specific compiler. To just yell "undefined behavior" and run off into the woods, as gcc appears to do, is not conforming IMO. – Lundin Nov 14 '22 at 12:31
  • @Lundin: One issue is that the Standard allows, probably deliberately, for compilers to assign a 32-bit register to hold an 8-bit `unsigned char;` value and then, if the value is used before initialization, behave as though it somehow holds a value outside the range 0-255. It also clearly allows a compiler that ensures that any register used for an unsigned char always holds a value 0-255 to omit code that be irrelevant if it does. As a general rule, the Standard makes no attempt to consider when compilers should or should not be allowed to combine optimizations which would be harmless... – supercat Dec 12 '22 at 17:31
  • ...if applied separately, but would cause a program to completely unravel if combined. It would be useful if the Standard could recognize a category of implementations which will recognize when aspects of behavior that would have been irrelevant in code as written become relevant as a result of compiler optimization. Given e.g. `unsigned char a; ... unsigned q=a; proc1(q); if (q < 256) proc2(q);`, such an implementation could generate code that might store an unmasked value into `q`, pass it to `proc1`, and then skip `proc2` if the value was over 255, or it could mask the value, but... – supercat Dec 12 '22 at 17:49
  • ...it would not be allowed to generate code that could invoke `proc2` if `q` is greater than 255. – supercat Dec 12 '22 at 17:51
11

Yes, it's undefined. The code can crash. C says the behavior is undefined because there's no specific reason to make an exception to the general rule. The advantage is the same advantage as all other cases of undefined behavior -- the compiler doesn't have to output special code to make this work.

Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?

Why do you think that doesn't happen? That's exactly the approach taken. The compiler isn't required to make it work, but it is not required to make it fail.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • 1
    The compiler doesn't have to have special code for this either, though. Simply allocating the space (as always) and ***not** intializing* the variable gives it the correct behavior. I don't think that needs special logic. – user541686 Aug 14 '12 at 23:51
  • @Mehrdad: That's completely false. Consider two cases: 1) Floating point numbers that have representations that don't return zero when subtracted from themselves such as NaNs. 2) Hardware that treats uninitialized memory specially. (In any event, that's not a problem. If you think no special code is needed, then great. The standard doesn't require any. So perfect. If any is needed though, the standard doesn't require the compiler to do it.) – David Schwartz Aug 14 '12 at 23:52
  • (1) They could've just said implementation-defined, or maybe required it for (unsigned?) integral types, since it isn't any extra work to "leave the contents as-is" anyway. (2) Hmm... I'm not sure I know what you mean. Like how would it treat uninitialized memory specially, and why could that be useful? – user541686 Aug 14 '12 at 23:54
  • 7
    1) Sure, they could have. But I can't think of any argument that would make that any better. 2) The platform knows that the value of uninitialized memory cannot be relied on, so it's free to change it. For example, it can zero uninitialized memory in the background to have zeroed pages ready for use when needed. (Consider if this happens: 1) We read the value to subtract, say we get 3. 2) The page gets zeroed because it's uninitialized, changing the value to 0. 3) We do an atomic subtract, allocating the page and making the value -3. Oops.) – David Schwartz Aug 14 '12 at 23:59
  • Oooooh, very interesting! That makes a lot of sense, thanks! :) – user541686 Aug 15 '12 at 00:01
  • Note that even unsigned types are allowed to have padding bits and thus trap representations. – R.. GitHub STOP HELPING ICE Aug 15 '12 at 00:25
  • @Mehrdad: Something important to remember though is that those who designed the standard likely didn't have any specific scenario in mind. They could only guess what future computers and hardware would be like and couldn't reliably predict what effect their decisions would have. So they only required behavior that they felt they needed to require to allow people to build correct programs and they viewed every requirement as having potential cost that had to be justified by a benefit. Requiring any predictable behavior for uninitialized data failed that test, in their opinion. – David Schwartz Aug 15 '12 at 00:42
  • @DavidSchwartz please add your example to the answer --- it's one of the best I've seen. – tobyodavies Aug 15 '12 at 03:43
  • 3
    -1 because you give no justification for your claim at all. There are situations where it would be valid to expect that the compiler just takes the value that is written in the memory location. – Jens Gustedt Aug 15 '12 at 07:18
  • 1
    @JensGustedt: I don't understand your comment. Can you please clarify? – David Schwartz Aug 15 '12 at 07:26
  • 4
    Because you just claim that there is a general rule, without refering to it. As such it is just an attempt of "proof by authority" which is not what I expect on SO. And for not effectively arguing why this couldn't be an unspecific value. The sole reason that this is UB in the general case is that `x` could be declared as `register`, that is that its address is never taken. I don't know if you were aware of that (if, you were hiding it effectively) but a correct answer must mention it. – Jens Gustedt Aug 15 '12 at 08:51
  • 2
    This answer is incorrect where it states “Yes, it's undefined.” As the answers of myself and Jens Gustedt show (with citations from the C standard, which this answer does not provide), taking the value of an uninitialized object does not by itself cause uninitialized behavior. In C 1999, undefined behavior only occurs if certain other conditions are met, and those conditions are not met for integer types on most common systems. See Jens Gustedt’s answer for the C 2011 situation. – Eric Postpischil Aug 15 '12 at 13:09
  • @EricPostpischil: On many real compilers for 32-bit machines, an uninitialized variable of type `uint16_t` may hold values outside the range 0-65535. How would that be allowable if such values were not considered to be trap representations? – supercat May 26 '16 at 23:14
0

While many answers focus on processors that trap on uninitialized-register access, quirky behaviors can arise even on platforms which have no such traps, using compilers that make no particular effort to exploit UB. Consider the code:

volatile uint32_t a,b;
uin16_t moo(uint32_t x, uint16_t y, uint32_t z)
{
  uint16_t temp;
  if (a)
    temp = y;
  else if (b)
    temp = z;
  return temp;  
}

a compiler for a platform like the ARM where all instructions other than loads and stores operate on 32-bit registers might reasonably process the code in a fashion equivalent to:

volatile uint32_t a,b;
// Note: y is known to be 0..65535
// x, y, and z are received in 32-bit registers r0, r1, r2
uin32_t moo(uint32_t x, uint32_t y, uint32_t z)
{
  // Since x is never used past this point, and since the return value
  // will need to be in r0, a compiler could map temp to r0
  uint32_t temp;
  if (a)
    temp = y;
  else if (b)
    temp = z & 0xFFFF;
  return temp;  
}

If either volatile reads yield a non-zero value, r0 will get loaded with a value in the range 0...65535. Otherwise it will yield whatever it held when the function was called (i.e. the value passed into x), which might not be a value in the range 0..65535. The Standard lacks any terminology to describe the behavior of value whose type is uint16_t but whose value is outside the range of 0..65535, except to say that any action which could produce such behavior invokes UB.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • Interesting. So are you saying the accepted answer is wrong? Or are you saying it's right in theory but in practice compilers may do weirder things? – user541686 Aug 08 '16 at 18:31
  • @Mehrdad: It is common for implementations to have behavior which goes beyond the bounds of what would be possible in the absence of UB. I think it would be helpful if the Standard recognized the concept of a partially-indeterminate value whose "allocated" bits will behave in a fashion that is, at worst, unspecified, but with additional upper bits that behave non-deterministically (e.g. if the result of the above function is stored to a variable of type `uint16_t`, that variable might sometimes read as 123 and sometimes 6553623). If the result ends up being ignored... – supercat Aug 08 '16 at 18:42
  • ...or used in such a way that any possible ways it might be read would all yield final results meeting requirements, the existence of partially-indeterminate value shouldn't be a problem. On the other hand, there is nothing in the Standard which would allow for the existence of partially-indeterminate values in any circumstances where the Standard would impose any behavioral requirements whatsoever. – supercat Aug 08 '16 at 18:44
  • It seems to me that what you are describing is exactly what is in the accepted answer -- that if a variable *could* have been declared with `register`, then it may have extra bits that make the behavior potentially undefined. That's exactly what you're saying, right? – user541686 Aug 08 '16 at 19:14
  • @Mehrdad: The accepted answer focuses on architectures whose registers have an extra "uninitialized" state, and trap if an uninitialized register is loaded. Such architectures exist, but are not commonplace. I describe a scenario where *commonplace* hardware may exhibit behavior which is outside the realm of anything contemplated by the C Standard, but would be usefully constrained if a compiler doesn't add its own additional wackiness to the mix. For example, if a function has a parameter that selects an operation to perform, and some operations return useful data but others don't,... – supercat Aug 08 '16 at 19:29
  • ...then in the cases where a caller specifies an operation that doesn't return useful data, being able to return an unitialized value may allow slightly more efficient code generation than having to load a meaningless value. – supercat Aug 08 '16 at 19:34
  • I think if you read the accepted answer carefully, it does not say that this behavior *only* exists on architectures with trap representations. Rather, it says that IF such an architecture would have such a problem with a `register` variable, then the code has undefined behavior -- even if that's not the architecture you're actually targeting. Try re-reading it and let me know if you disagree. – user541686 Aug 08 '16 at 19:51
  • @Mehrdad: From the accepted answer: *Such variables are treated specially because there are architectures that have real CPU registers that have a sort of extra state that is "uninitialized" and that doesn't correspond to a value in the type domain.* If a 32-bit value used for a uint16_t has its upper bits set, that would represent a state outside the domain of uint16_t, but the processor would neither know nor care that the register was being used for a uint16_t, and would thus see nothing special about the value in the register. – supercat Aug 08 '16 at 19:55
  • That quote is talking about the treatment of variables, a C concept. What you just said is about the CPU's treatment of registers, am external concept. So of course the CPU doesn't necessarily know the variables data type, but that's not what the quote is saying. The quote is saying, "when some values may be outside the domain on SOME architectures, the behavior is undefined in the language (i.e. everywhere), because on those particular architectures, it could have been a trap representation". – user541686 Aug 08 '16 at 20:06
  • @Mehrdad: The quote suggests that the primary *reason* the behavior is undefined is the existence of hardware registers which recognize an "uninitialized" state. Much of the C Standard was predicated on a philosophy that if behavior was defined on some platforms but not others before C89 was published, leaving it undefined in the Standard should preserve that status quo; such a philosophy still holds in much of the world of commercial embedded compilers (excluding gcc), so the possibility of weird "natural" behavior may be very important in such contexts. – supercat Aug 08 '16 at 20:34
  • @Mehrdad supercat likes to post extended comments as answers. This non-answer has no bearing on the question or its accepted answer. – Jim Balter Aug 15 '16 at 01:48
  • @JimBalter: The question asks "why" the Standard says such actions invoke UB. For almost any question "Why does document X say Y" the obvious correct-but-unhelpful answer would be "Because that's what the authors wrote", but that would immediately prompt "What reasons would the authors have had for writing that". I therefore regard questions that ask why document X says Y as implicitly asking "What reasons would the authors of document X have had for saying Y". Do you regard such inferences as inappropriate? – supercat Aug 15 '16 at 15:25
  • A C compiler must implement the C language. How it does so is up to the compiler. If the rules of the language on a particular arch rule out undefined behaviour, the compiler has to implement it. For example, if a translation results in a potential trap representation where there must not be one according to C, the compiler has mistranslated the program. So on such architectures, the compiler might have to generate a different, but correct, sequence instead. – Remember Monica Mar 18 '22 at 02:31
  • @RememberMonica: Indeed, but the authors of the Standard sought to deliberately classify as Undefined Behavior any situation where there might exist some implementation where a trap representation might cause weirdness, even if 99% of implementations were expected to behave in at least somewhat predictable fashion (e.g. yield some kind of likely-meaningless value). – supercat Mar 18 '22 at 14:40