How may taking the address of an object influence undefined behavior?

Question

On cppreference.com, in the section Implicit conversions, in the subsection "Lvalue conversion", it is noted that

[i]f the lvalue designates an object of automatic storage duration whose address was never taken and if that object was uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined. [emphasis mine]

From that, I undestand that the "act of taking an address" of an object at some point in time may influence in some way whether the undefined behavior happens or not later when this object "is used". If I'm right, then it seems at least unusual.

Am I right? If so, how is that possible? If not, what am I missing?

See [this question](https://stackoverflow.com/questions/11962457) and its [first answer](https://stackoverflow.com/questions/11962457/why-is-using-an-uninitialized-variable-undefined-behavior/11965368#11965368). See also [this question](https://stackoverflow.com/questions/25074180). — Steve Summit, Oct 26 '22 at 17:01
@SteveSummit Thanks! That answer answers the question I asked under dbush's answer. Yet another difference between C and C++ I wasn't aware of. :-) — Ted Lyngmo, Oct 26 '22 at 17:07
@TedLyngmo Yes, it's a surprising and absurdly subtle distinction, which is why I had bookmarked that prior question! — Steve Summit, Oct 26 '22 at 17:13
@SteveSummit Thanks, an interesting related thing. I'll bookmark them, too. — user20276305, Oct 26 '22 at 19:14

score 3 · Accepted Answer · answered Oct 26 '22 at 17:13

cppreference.com is deriving this from a rule in the C standard. C 2018 6.3.2 2 says:

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

So, the reason that taking an address matters is fundamentally because “the C standard says so” rather than because taking the address “does something” in the model of computing.

The reason this rule was added to the C standard was to support some behaviors Hewlett-Packard (HP) desired for its Itanium processor. In that processor, each of certain registers has an associated bit that indicates the register is “uninitialized.” So HP was able to make programs detect and trap in some instances where an object had not been initialized. (This detection does not extend to memory; the bit was only associated with a processor register.)

By saying the behavior is undefined if an uninitialized object is used, the C standard allows a trap to occur but also allows that a trap might not occur. So it allowed HP’s behavior of trapping, it allowed it when HP’s software did not detect the issue and so did not trap, and it allowed other vendors to ignore this and provide whatever value happened to be in a register, as well as other behaviors that might arise from optimization by the compiler.

As for predicating the undefined behavior based on automatic storage duration and not taking the address, I suspect this was a bit of a kludge. It provides a criterion that worked for the parties involved: HP was able to design their compiler to use the “unitialized” feature with their registers, but the rule does not carve out a great deal of object use as undefined behavior. For example, somebody might want to write an algorithm that processes large parts of many arrays en masse, ignoring that a few values “along the edges” of defined regions might be uninitialized. The idea there would be that, in some situations, it is more efficient to do a block of operations and then, at the end, carve away the ones you do not care about. Thus, for these situations, programmers want the code to work with “indeterminate values”—the code will execute, will reach the end of the operations, and will have valid values in the results in cares about, and there will not have been any traps or other undefined behavior arising from the values they did not care about. So limiting the undefined behavior of uninitialized objects to automatic objects whose address is not taken may have provided a boundary that worked for all parties concerned.

You wrote that "[t]he reason this rule was added to the C standard was to support some behaviors Hewlett-Packard (HP) desired for its Itanium processor." By "this rule", do you mean the whole quote, or just the part about the register storage class? — user20276305, Oct 26 '22 at 19:39
@user20276305: The whole quote. That sentence was added in C 2011. Prior to that there was no explicit rule that using any uninitialized object had undefined behavior. — Eric Postpischil, Oct 26 '22 at 19:50
That is interesting, I didn't think about that. It always seems so reasonable to me that using uninitialized objects produces undefined behavior. Like it was with C from the beginning. Now I need to reflect upon your answer a bit more. From what you said I understand there was no rule at all what should using uninitialized objects cause. Could you tell, if you happen to know, how this used to be handled before this rule was introduced? — user20276305, Oct 26 '22 at 20:06
@user20276305: In earlier versions of the C standard, from 1999 at least (have not checked 1990), the “value” of an uninitialized object was “indeterminate.” It still is. When a value is indeterminate, it has some value, but it can be different each time it is used. E.g., `printf("%d %d\n", x, x);` could print “37 -955”. This is different from undefined behavior; if `x` is merely indeterminate, that `printf` must execute and print something for two `int` values. If `x` falls into the undefined behavior category described above, then the `printf` can trap, or the program could do other things. — Eric Postpischil, Oct 26 '22 at 20:11
OK, I think I understand it. Next thing I don't understand is where you wrote that "[b]y saying the behavior is undefined if an uninitialized object is used, the C standard allows a trap to occur but also allows that a trap might not occur." I'm not yet familiar with traps, so I'm not sure I should ask about them (even don't know what to ask about). Anyway I feel a bit confused that you wrote that this supports the behavior of the Itanium processors. Since this rule allows traps both to occur and not to occur, where is the support? Was it that before the rule traps were not allowed at all? — user20276305, Oct 26 '22 at 20:31
@user20276305: Yes, traps were not allowed. Trapping means execution of the program is interrupted, so no further operations are performed. Prior to C 2011, any uninitialized object had an indeterminate value. Using it resulted in a value. The program could not trap; it had to use some value and continue. If HP had implemented traps for using uninitialized values, their C implementation would not have conformed to the C standard. By making a new rule that using certain uninitialized objects has undefined behavior, the standard allows any behavior in those cases, so HP can implement traps. — Eric Postpischil, Oct 26 '22 at 20:37
And now that makes sense! Although it's still not something I would explain to somebody when awaken in the middle of the night (so a long and interesting road to go for me still). Thanks for the explanation! — user20276305, Oct 26 '22 at 20:42

score 2 · Answer 2 · answered Oct 26 '22 at 16:40

2

If an object had never had its address taken, it could potentially be optimized away. In such cases, attempting to read an uninitialized variable need not yield the same value on multiple reads.

By taking the address of an object, that guarantees that storage is set aside for it which can subsequently be read from. Then the value read will at least be consistent (though not necessarily predictable) assuming it is not a trap representation.

answered Oct 26 '22 at 16:40

dbush

205,898
23
218
273

Isn't reading the value of an uninitialized variable enough to cause UB in C? The cppreference link seems to imply that it's not (which surprised me). – Ted Lyngmo Oct 26 '22 at 16:45
I'm not yet familiar with code optimizations, but if by "optimizing away" you mean "removing", then it seems to make sense. And, though I think I need to learn a bit more when exactly storage is set aside for objects, for now you answer seems good enough to rely on. – user20276305 Oct 26 '22 at 16:53
And, at first I understood the quote differently than you: for me the "act" I mentioned caused something in the time it was being done. On the other hand, you seem to understand it not that the "act" may cause anything, but that it in itself is only a kind of assertion. In the context of C, your understanding seems to me to be far more reasonable than mine. – user20276305 Oct 26 '22 at 17:00
An object whose address is taken can be optimized away just as an object whose address is not taken, provided the compiler can determine the address does not affect observable behavior just as it determines the object’s value does not affect observable behavior. All objects have an address in C’s abstract model, and taking the address or not does not affect that, and no object needs to have an address in the implemented program if it has no effect on observable behavior. – Eric Postpischil Oct 26 '22 at 17:14

How may taking the address of an object influence undefined behavior?

2 Answers2