12

Consider this code in block scope:

struct foo { unsigned char a; unsigned char b; } x, y;
x.a = 0;
y = x;

C [N1570] 6.3.2.1 2 says “If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.”

Although a member of x has been assigned a value, no assignment to x has been performed, and its address has not been taken. Thus, it appears 6.3.2.1 2 tells us the behavior of x in y = x is undefined.

However, if we had assigned a value to every member of x, it would seem unreasonable to consider x to be uninitialized for the purposes of 6.3.2.1 2.

(1) Is there anything in the standard which, strictly speaking, causes 6.3.2.1 2 not to apply to (make undefined) the code above?

(2) Supposing we were modifying the standard or determining a reasonable modification to 6.3.2.1 2, are there reasons to prefer one of the following over the others? (a) 6.3.2.1 2 does not apply to structures. (b) If at least one member of a structure has been assigned a value, the structure is not uninitialized for purposes of 6.3.2.1 2. (c) If all named1 members of a structure have been assigned a value, the structure is not uninitialized for purposes of 6.3.2.1 2.

Footnote

1 Structures may have unnamed members, so it is not always possible to assign a value to every member of a structure. (Unnamed members have indeterminate value even if the structure is initialized, per 6.7.9 9.)

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • In C++ it's UB due to the potential for trapping `int`, and the default copy constructor performs a memberwise copy. See https://stackoverflow.com/questions/9163555/why-is-this-simple-assignment-undefined-behaviour. But C and C++ diverge on these sort of things quite widely. I don't *think* the C copy works in the same way. Above my paygrade unfortunately, but FWIW I lean towards the "yes, it's UB side". – Bathsheba Nov 22 '17 at 11:07
  • @Bathsheba I'm leaning to the other side, because reading uninitialized values is only UB if the indeterminate value is a trap representation, and `int` values rarely (if ever?) have those. On the other hand, [there's this](https://stackoverflow.com/a/11965368/440558) so I'm not so sure any more. Really above my paygrade as well... :) – Some programmer dude Nov 22 '17 at 11:12
  • @EricPostpischil: To keep the pedants at bay therefore, would it be wise amending the question to use `unsigned char` types? – Bathsheba Nov 22 '17 at 11:16
  • 2
    I just edited to use `unsigned char` to make it clear we are not interested in trap representations, just the meaning of 6.3.2.1 2. – Eric Postpischil Nov 22 '17 at 11:16
  • 1
    @Bathsheba: Yes. I was reluctant to edit the question at first, in case it would be a bit unfair to people drafting answers. But clearly I need to take pre-emptive measures. – Eric Postpischil Nov 22 '17 at 11:17
  • FYI, the question arose from [this question](https://stackoverflow.com/questions/47430860/do-a-union-or-struct-permit-assignment-from-an-uninitialised-instance). – Eric Postpischil Nov 22 '17 at 11:18
  • @Someprogrammerdude Reading an uninitialized variable that could have been declared `register` is `UB` even if the type has no trap representations (http://port70.net/~nsz/c/c11/n1570.html#6.3.2.1p2). – Petr Skocik Nov 22 '17 at 17:02
  • Related: https://stackoverflow.com/questions/33393569/is-it-undefined-behaviour-to-memcpy-from-an-uninitialized-variable – Tor Klingberg Dec 07 '17 at 17:16

4 Answers4

8

My opinion is that it is undefined behaviour simply because it is not explicitly defined by the standard. From 4 Conformance §2 (emphasize mine) :

...Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior.

After many reads in N1570 draft I cannot find any explicit definition of behaviour for using a partially initialized struct. On one hand 6.3.2.1 §2 says:

...If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined

so here x is automatic, has never be initialized (only one of its members), and admitedly its address is never taken so we could think that it is explicitely UB

On the other hand, 6.2.6.1 §6 says:

... The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.

As 6.2.6.1 §5 has just defined a trap representation:

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that means 0 value for a member and an undefined value for b member. does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.

we could think that it is always legal to take the value of a struct because it cannot be a trap representation

In addition, it is not clear for me if setting the value of a member of a struct actually leaves the struct in an unitialized state.

For all those reasons, I think that the standard does not clearly defines what the behaviour should be and simply for that reason it is undefined behaviour.


That being said I am pretty sure that any common compiler will accept it and will give y the current representation of x, that means 0 value for a member and an indeterminate value of same representation as the current one for x.b for the b member.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • 1
    I accepted this answer and awarded the bounty because I think it is the correct answer—technically, the behavior is undefined by the standard, even though that may not have been the intent. – Eric Postpischil Nov 30 '17 at 15:30
  • I'm not sure I follow your logic. The guarantee that structure/union values will never be trap representations will be very useful if it implies that copying of such objects from one region of readable storage to a disjoint region of writable storage will at worst leave portions of the destination holding Indeterminate Value, and would seldom be useful otherwise. At minimum, that should imply that quality implementations should behave in such fashion unless they document a compelling reason for doing otherwise. As for what specialized or poor-quality implementations might do, who knows? – supercat Aug 30 '18 at 22:32
1

Firstly, let's note that the quoted part of 6.3.2.1/2, the so-called "Itanium clause" is the only clause under which this code might have a problem. In other words, if this clause were not present, the code is fine. Structs may not have trap representations, so y = x; is otherwise OK even if x is entirely uninitialized. The resolution of DR 451 clarifies that indeterminate values may be propagated by assignment, without causing UB.


Back to the Itanium clause here. As you point out, the Standard does not clearly specify whether x.a = 0; negates the precondition "x is uninitialized".

IMO, this means we should turn to the rationale for the Itanium clause to determine the intent. The purpose of the wording of the standard document, in general, is to implement an intent; generally speaking, I don't agree with being dogmatic about minute detail of the standard: taking shades of meaning out of the wording that were not intended by those who created the wording.

This Q/A gives a good explanation of the rationale. The potential problem is that x might be stored in a register with the NaT bit set, and then y = x will cause a hardware exception due to reading a register that has that bit set.


So the question is: On IA64 does x.a = 0; clear the NaT bit? I don't know and I guess we would need someone familar with that platform to give a conclusive answer here.

Naively, I imagine that if x is in a register then, in general, x.a = 0; will need to read the old value, and apply a mask to clear the bits for a, thereby triggering the exception if x was NaT. However, x.a = 0; cannot trigger UB, so that logic must be incorrect. Perhaps IA64 compilers never store a struct in a register, or perhaps they clear the NaT bit on declaration of one, or perhaps there's a hardware instruction to implement x.a = 0; on a previously-NaT register, I don't know.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • With regards to your opinion about "the intent", allow me to quote Dennis Ritchie: "The intentions of the committee are irrelevant; only their document matters." ... also, allow me to quote the Rationale document: "this Rationale is not part of the Standard. The C language is defined by the Standard alone." – Dror K. Nov 28 '17 at 03:26
  • 1
    @DrorK. that would be nice, if the document actually answered the questions on its own. But it does not – M.M Nov 28 '17 at 03:29
  • 3
    @DrorK.: Is Dennis Ritchie a normative part of the standard? – Eric Postpischil Nov 28 '17 at 03:43
  • What if the compiler stores `x.a` and `x.b` in two registers? If the structure is initialized or both members are assigned, then both NaT bits are cleared. And we can safely assign either without reading it or the other first, so we are safe from triggering an exception. But, if only one were assigned/initialized, `y = x` would trigger an exception. – Eric Postpischil Nov 28 '17 at 03:47
  • 1
    Sure. I think this question and discussion is academic—it is an exercise for thinking about the standard. I expect we all agree on how compiler writers ought to interpret this—undefined behavior is not intended for the code in the question. But we might end up with a defect to report to the standard committee. – Eric Postpischil Nov 28 '17 at 03:49
  • @EricPostpischil: Any decent compiler should generate code that even if one of the source registers contains a bit pattern that's not valid for its type, the worst that would happen is that the destination register would be left with a value that's not valid for its type. If nothing actually does anything with the value in the destination register except blindly copy it as part of a structure copy, a good compiler should produce code that doesn't trap. – supercat Nov 29 '17 at 04:17
  • 1
    @EricPostpischil: As for whether any ambiguity here is a "defect", that depends whether the Standard is *intended* to fully describe everything that a quality compiler must do, or whether it is intended to omit for brevity situations where (1) there is one obvious sensible behavior, and (2) there is no reason to expect that someone who is seeking to write a quality compiler would do anything else, outside of some diagnostic scenarios that needn't be bound by the Standard anyway. If code copies a partially-written structure, and ignores fields in the copy that weren't written in the original... – supercat Nov 30 '17 at 15:23
  • 1
    @EricPostpischil: ...there might plausibly be some benefit to allowing the copy operation to "raise an implementation-defined signal" to aid conformance with various coding standards, but unless an implementation documents such behavior, I see no reason to regard the cost-benefit ratio for defining the behavior as described as anything less than overwhelmingly in favor. Significant benefit in reduced code size and execution time, generally at zero cost and never at significant cost. – supercat Nov 30 '17 at 15:29
  • 1
    @supercat: Sigh. Discussion about cost-benefits ratios is irrelevant. My question does not ask what is a good design or even what the intent was. My questions asks what the standard says. Unfortunately, your comments are not helpful in answering this. – Eric Postpischil Nov 30 '17 at 15:32
  • 1
    @EricPostpischil: The rationale for the Standard recognizes the possibility of an implementation being conforming, but of such low quality as to be essentially useless. Individual places where the Standard fails to mandate features that offer obvious benefits at essentially zero cost would only "defects" if the Standard were intended to completely specify everything needed to make something a useful quality implementation. Do you think the Standard is intended to be a complete specification? If not, by what means should readers fill in the gaps? – supercat Nov 30 '17 at 15:50
0

Copying a partially-written structure falls in the category of actions which quality implementations will process in consistent fashion absent a good reason to do otherwise, specialized implementations might process differently because they have a good reason to do so, and poor-quality-but-conforming implementations may use as an excuse to behave nonsensically.

Note that copying uninitialized values of an automatic-duration or malloc-created character array would fall in a similar category of actions, except that implementations that would trap on such an action (e.g. to help programmers identify and track down potential information leaks) would not be allowed to describe themselves as "conforming".

An implementation which is specialized to diagnose accidental information leaks might sensibly trap efforts to copy a partially-written structure. On an implementation where using an unitialized value of some type could result in strange behavior, copying a structure with an unitialized member of that type and then attempting to use that member of the copy might sensibly do likewise.

The Standard doesn't particularly say whether a partially-written structure counts as having been written or not, because people seeking to produce quality implementations shouldn't care. Quality implementations specialized for detecting potential information leakage should squawk at any attempt to copy uninitialized data, without regard for when the Standard would or would not allow such behavior (provided that they describe themselves as non-conforming). Quality general-purpose implementations designed to support a wide variety of programs should allow partially-initialized structures to be copied in cases where programs don't look at the uninitialized portions outside the context of whole-structure copying (such treatment is useful and generally costs nothing in non-contrived cases). The Standard could be construed as granting poor-quality-but-conforming implementations the right treat copying of partially-written structures as an excuse to behave nonsensically, but such implementations could use almost anything as such an excuse. Quality implementations won't do anything unusual when copying structures unless they document a good reason for doing so.

supercat
  • 77,689
  • 9
  • 166
  • 211
-3

The C Standard specifies that structure types cannot have trap representations, although members of structs may. The primary circumstance in which that guarantee would be useful would be in cases involving partially-written structures. Further, a prohibition on copying structures before one had written all members, even ones the recipient of the copy would never use, would require programmers to write needlessly-inefficient code and serve no useful purpose. Imposing such a requirement in the name of "optimization" would be downright dumb, and I know of no evidence that the authors of the Standard intended to do so.

Unfortunately, the authors of the Standard use the same terminology to describe two situations:

  1. Some implementations define the behavior of some action X in all cases, while some only define it for some; other parts of the Standard define the action in a few select cases. The authors want to say that implementations need not behave like the ones that define the behavior in all cases, without revoking guarantees made elsewhere in the Standard

  2. Although other parts of the Standard would define the behavior of action X in some cases, guaranteeing the behavior in all such cases could be expensive and implementations are not required to guarantee them even cases where other parts of the Standard would define them.

Before the Standard was written, some implementations would zero-initialize all automatic variables. Thus, those implementations would guarantee the behavior of reading uninitialized values, even of types with trap representations. The authors of the Standard wished to make clear that they did not want to require all implementations do likewise. Further, some objects may define the behavior of all bit patterns when stored in memory, but not when stored in registers. Such treatment would generally be limtied to scalar types, however, rather than structures.

From a practical perspective, defining the behavior of copying a structure as copying the state (defined or indeterminate) of all fields would not cost any more than allowing compilers to behave in arbitrary fashion when copying partially-written structures. Unfortunately, some compiler writers erroneously believe that "cleverness" and "stupidity" are antonyms, and thus behave as though the authors of the Standard wished to invite compilers to assume that programs will never receive any input that would cause structures to be copied after having been partially written.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • 1
    Trap representations are not relevant to this question. 6.3.2.1 2 does not involve trap representations; it makes behavior undefined given the stated conditions even if the object type has no trap representations. – Eric Postpischil Nov 28 '17 at 00:46
  • @EricPostpischil: In C89, the difference between an Unspecified Value and an Indeterminate Value is that the an object holding the former would be guaranteed to hold a value of its type, while the latter could hold either a value or a trap representation. If you think the authors of the Standard did not intend the prohibition of structures having trap representations to guarantee that partially-written structures may be safely copied if nothing tries to access the indeterminate members of the copies (except for the purpose of making more copies of the structure), perhaps you can... – supercat Nov 28 '17 at 06:53
  • ...enlighten me as to what you think the intended purpose of that prohibition was? I know that modern compiler writers want to process a language that requires that programs waste time writing things with known bit values even in cases where every possible combination of bit values would otherwise meet a program's requirements, but I don't think that's what the authors of C89 had in mind (if it is what the authors of later versions of the Standard had in mind, I'd regard C89 as superior). – supercat Nov 28 '17 at 06:58
  • As M.M. notes, this sentence in 6.3.2.1 2 was created to support Itanium/IA-64’s ability to detect uninitialized data in registers. It is completely separate of trap representations. It even applies to unsigned integers, including char, which have no trap representations. Itanium registers have an extra bit that indicates whether there is valid data in the register. If an object in such a register is used without being initialized or assigned a value, a hardware exception occurs. 6.3.2.1 2 was added to say a C implementation may do this, even if there is no trap representation in the type. – Eric Postpischil Nov 28 '17 at 13:12
  • @EricPostpischil: Long before Itanium, machines have had different representations in registers and memory, and it would not have been unusual to use a register for a type without all bit patterns being valid for that type. If a two-member struct was assigned a pair of registers and one member was written before the struct was copied to a different pair of registers, the unwritten member might have a bit pattern that isn't valid for its type, which could cause trouble *if code actually tried to use the value*. Perhaps the Itanium was sufficiently broken that supporting such semantics... – supercat Nov 28 '17 at 13:37
  • ...would have occasionally cost an extra instruction, but requiring an occasional extra instruction on one platform is far better than requiring that a lot of code be written needlessly inefficiently on every other platform in the universe (and arbitrarily malfunction if it isn't), even when that code isn't going to run on the Itanium. Again, I'd like someone to enlighten me as to what the point of the prohibition against structures having trap representations would be if not to define the behavior of copying partially-written structures. – supercat Nov 28 '17 at 13:39
  • The Itanium is not broken. The sentence was added to the standard to support desired behavior—it allows a C implementation to trap uninitialized behavior in an additional way, beyond trap representations. Yes, prohibiting structures from having trap representations was likely done to allow copying incompletely initialized structures without worrying about trap representations. And it accomplishes that purpose. But this sentence in 6.3.2.1 2 supports a new mechanism entirely separate from trap representations… – Eric Postpischil Nov 28 '17 at 14:56
  • … It even applies to types without trap representations, such as unsigned integers. If the standards committee had thought of it, maybe they would have exempted structures from this. But they did not. – Eric Postpischil Nov 28 '17 at 14:56
  • @EricPostpischil: There are a number of situations where it may be useful for an implementation to be configurable trap actions or constructs which, although defined, would not be used deliberately in certain application fields. Implementations don't need any permission from the Standard to offer such options, nor do they need permission to offer options that would optimize based on assumptions that code won't do various things that would otherwise have defined behavior. The ability to return partially-initialized structures existed in C from the beginning, and allows more efficient code... – supercat Nov 29 '17 at 04:23
  • ...than would otherwise be possible. If there were a construct which would accept an lvalue and expressly indicate that a compiler must regard it as--at worst--holding an arbitrary bit pattern, then it would be reasonable to deprecate code that returns partially-written structures without using that directive. As yet, however, no such directive exists. – supercat Nov 29 '17 at 04:24