32

Is it valid to copy a struct some of whose members are not initialized?

I suspect it is undefined behavior, but if so, it makes leaving any uninitialized members in a struct (even if those members are never used directly) quite dangerous. So I wonder if there is something in the standard that allows it.

For instance, is this valid?

struct Data {
  int a, b;
};

int main() {
  Data data;
  data.a = 5;
  Data data2 = data;
}
L. F.
  • 19,445
  • 8
  • 48
  • 82
Tomek Czajka
  • 481
  • 3
  • 8
  • I recall seeing a similar question a while ago but can't find it. This [question](https://stackoverflow.com/questions/59231856/can-a-trivial-type-class-be-copied-when-not-all-its-members-are-initialized) is related as is [this one](https://stackoverflow.com/questions/15231527/is-it-ok-to-copy-uninitialized-data-if-it-will-be-unused-set-later). – 1201ProgramAlarm Feb 07 '20 at 13:50

4 Answers4

25

Yes, if the uninitialized member is not an unsigned narrow character type or std::byte, then copying a struct containing this indeterminate value with the implicitly defined copy constructor is technically undefined behavior, as it is for copying a variable with indeterminate value of the same type, because of [dcl.init]/12.

This applies here, because the implicitly generated copy constructor is, except for unions, defined to copy each member individually as if by direct-initialization, see [class.copy.ctor]/4.

This is also subject of the active CWG issue 2264.

I suppose in practice you will not have any problem with that, though.

If you want to be 100% sure, using std::memcpy always has well-defined behavior if the type is trivially copyable, even if members have indeterminate value.


These issues aside, you should always initialize your class members properly with a specified value at construction anyway, assuming you don't require the class to have a trivial default constructor. You can do so easily using the default member initializer syntax to e.g. value-initialize the members:

struct Data {
  int a{}, b{};
};

int main() {
  Data data;
  data.a = 5;
  Data data2 = data;
}
walnut
  • 21,629
  • 4
  • 23
  • 59
  • well.. that struct isn't a POD (Plain old data)? That means the members will be initialized with default values? It's a doubt – Kevin Kouketsu Feb 07 '20 at 12:30
  • Isn't it the shallow copy in this this case? what can go wrong with this unless uninitialized member is accessed in the copied struct? – TruthSeeker Feb 07 '20 at 12:32
  • @KevinKouketsu I have added a condition for the case where a trivial/POD type is required. – walnut Feb 07 '20 at 12:32
  • @TruthSeeker The standard says that it is undefined behavior. The reason it is generally undefined behavior for (non-member) variables is explained in the answer by AndreySemashev. Basically it is to support trap representations with uninitialized memory. Whether this is *intended* to apply to implicit copy construction of structs is the question of the linked CWG issue. – walnut Feb 07 '20 at 12:35
  • @TruthSeeker The implicit copy constructor is defined to copy each member individually as if by direct initialization. It is not defined to copy the object representation as if by `memcpy`, even for trivially copyable types. The only exception are unions, for which the implicit copy constructor does copy the object representation as if by `memcpy`. – walnut Feb 07 '20 at 12:37
  • @walnut: Unfortunately, the Standard makes no attempt to indicate what should be expected in scenarios where every byte of the structure is occupied by types that have no trap representations (or, for that matter, objects and arrays of type `unsigned char`). There are situations where mandating that such objects behave as Unspecified Value could needlessly impede optimizations, but requiring that programmers explicitly initialize values would be even more expensive. – supercat Feb 24 '20 at 20:23
  • @supercat [\[decl.init\]/12.3](https://timsong-cpp.github.io/cppwp/n4659/dcl.decl#dcl.init-12.3) and [\[decl.init\]/12.4](https://timsong-cpp.github.io/cppwp/n4659/dcl.decl#dcl.init-12.4) give the only two exceptions where the copy construction will not lead to UB with indeterminate values and it specifies that the resulting structure also has indeterminate values. This does not mean that the compiler has to copy the actual value in memory. I am not arguing that the current rules in the standard are good. As can be seen from the linked CWG issue, it may well be an unintended defect. – walnut Feb 24 '20 at 21:35
  • @walnut: Both C and C++ suffer from the Standard's massive overuse of "Undefined Behavior" to describe many different concepts, including operations which may be impractical to handle predictably on a few obscure platforms, but should be processed identically by every implementation where the commonplace behavior would be practical and useful. The only situations where the authors of the Standard expected that anyone should care about whether such behaviors were mandated by the Standard would be those where an implementation's customers could benefit from having it do something unusual. – supercat Feb 24 '20 at 21:50
  • @walnut: Consequently, the authors of the C Standard made almost no effort--and the authors of the C++ Standard made relatively little effort--to provide practical and portable ways of accomplishing everything that could be done easily using "popular extensions" that extended the language to behave usefully in cases where the Standard itself imposed no requirements. – supercat Feb 24 '20 at 21:53
  • @supercat Sure, "*undefined behavior*" means only that the standard doesn't define any behavior. If the compiler makes any additional guarantee, then it is not wrong or unsafe to write code with standard-undefined-behavior. My answer here is only about the guarantees that the standard gives, since the question didn't specify anything else. I wrote "*I suppose in practice you will not have any problem with that, though.*", because there probably is no compiler/platform in use where this wouldn't actually behave as expected (at least for `int`, not sure about `float`), as far as I am aware. – walnut Feb 24 '20 at 22:32
  • @walnut: The authors of clang and gcc don't care if there would be any reason for an implementation that isn't being deliberately obtuse to process a construct nonsensically. The notion of "you probably won't have any problem" needs to be augmented with "unless you're using a compiler like clang or gcc which goes out of its way to treat UB as an excuse to behave nonsensically, in which case all bets are off". – supercat Feb 24 '20 at 22:46
13

In general, copying uninitialized data is undefined behavior because that data may be in a trapping state. Quoting this page:

If an object representation does not represent any value of the object type, it is known as trap representation. Accessing a trap representation in any way other than reading it through an lvalue expression of character type is undefined behavior.

Signalling NaNs are possible for floating point types, and on some platforms integers may have trap representations.

However, for trivially copyable types it is possible to use memcpy to copy the raw representation of the object. Doing so is safe since the value of the object is not interpreted, and instead the raw byte sequence of the object representation is copied.

Andrey Semashev
  • 10,046
  • 1
  • 17
  • 27
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/216950/discussion-on-answer-by-andrey-semashev-copying-structs-with-uninitialized-membe). – Samuel Liew Jun 30 '20 at 13:10
0

In some cases, such as the one described, the C++ Standard allows compilers to process constructs in whatever fashion their customers would find most useful, without requiring that behavior be predictable. In other words, such constructs invoke "Undefined Behavior". That doesn't imply, however, that such constructs are meant to be "forbidden" since the C++ Standard explicitly waives jurisdiction over what well-formed programs are "allowed" to do. While I'm unaware of any published Rationale document for the C++ Standard, the fact that it describes Undefined Behavior much like C89 does would suggest the intended meaning is similar: "Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior".

There are many situations where the most efficient way to process something would involve writing the parts of a structure that downstream code is going to care about, while omitting those that downstream code isn't going to care about. Requiring that programs initialize all members of a structure, including those that nothing is ever going to care about, would needlessly impede efficiency.

Further, there are some situations where it may be most efficient to have uninitialized data behave in non-deterministic fashion. For example, given:

struct q { unsigned char dat[256]; } x,y;

void test(unsigned char *arr, int n)
{
  q temp;
  for (int i=0; i<n; i++)
    temp.dat[arr[i]] = i;
  x=temp;
  y=temp;
}

if downstream code won't care about the values of any elements of x.dat or y.dat whose indices weren't listed in arr, the code might be optimized to:

void test(unsigned char *arr, int n)
{
  q temp;
  for (int i=0; i<n; i++)
  {
    int it = arr[i];
    x.dat[index] = i;
    y.dat[index] = i;
  }
}

This improvement in efficiency wouldn't be possible if programmers were required to explicitly write every element of temp.dat, including those downstream wouldn't care about, before copying it.

On the other hand, there are some applications where it's important to avoid the possibility of data leakage. In such applications, it may be useful to either have a version of the code that's instrumented to trap any attempt to copy uninitialized storage without regard for whether downstream code would look at it, or it might be useful to have an implementation guarantee that any storage whose contents could be leaked would get zeroed or otherwise overwritten with non-confidential data.

From what I can tell, the C++ Standard makes no attempt to say that any of these behaviors is sufficiently more useful than the other as to justify mandating it. Ironically, this lack of specification may be intended to facilitate optimization, but if programmers can't exploit any kind of weak behavioral guarantees, any optimizations will be negated.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • IMHO some people are too sensitive about UB. Your answer makes sense. – Super-intelligent Shade Dec 11 '21 at 18:49
  • 1
    @InnocentBystander: Around 2005, it became fashionable to ignore the distinction between what conforming compilers *could* do, versus what general-purpose compilers *should* do, and also to prioritize the efficiency with which an implementation could process "fully portable" programs, as opposed to the efficiency with which it could most efficiently accomplish the tasks at hand (which might entail the use of constructs which are "non-portable" but widely supported). – supercat Dec 11 '21 at 22:11
-2

Since all members of the Data are of primitive types, data2 will get exact "bit-by-bit copy" of the all members of data. So the value of data2.b will be exactly the same as value of the data.b. However, exact value of the data.b cannot be predicted, because you have not initialized it explicitly. It will depend on values of the bytes in the memory region allocated for the data.

ivan.ukr
  • 2,853
  • 1
  • 23
  • 41
  • Can you support this with a reference to the standard? The links provided by @walnut imply this is undefined behavior. Is there is an exception for PODs in the standard? – Tomek Czajka Feb 12 '20 at 20:41
  • Although following is not link to standard, still: https://en.cppreference.com/w/cpp/language/copy_constructor#Trivial_copy_constructor "TriviallyCopyable objects can be copied by copying their object representations manually, e.g. with std::memmove. All data types compatible with the C language (POD types) are trivially copyable." – ivan.ukr Feb 13 '20 at 10:10
  • The only "undefined behaviour" in this case is that we cannot predict value of uninitialized member variable.But the code compiles and runs successfully. – ivan.ukr Feb 13 '20 at 10:13
  • 1
    The fragment you quote talks about the behavior of memmove, but it's not really relevant here because in my code I'm using the copy constructor, not memmove. The other answers imply that using the copy constructor results in undefined behavior. I think you also misunderstand the term "undefined behavior". It means that the language provides no guarantees at all, e.g. the program might crash or corrupt data randomly or do anything. It doesn't just mean that some value is unpredictable, that would be unspecified behavior. – Tomek Czajka Feb 13 '20 at 17:07
  • @ivan.ukr The C++ standard specifies that the implicit copy/move constructors act member-wise as if by direct-initialization, see links in my answer. Therefore the copy construction does *not* make a "*"bit-by-bit copy"*". You are only correct for union types, for which the implicit copy constructor *is* specified to copy the object representation as if by a manual `std::memcpy`. None of this prevents using `std::memcpy` or `std::memmove`. It only prevents using the implicit copy constructor. – walnut Feb 13 '20 at 19:21
  • Yes, agree, it is member-wise, and what I actully was meaning will be no difference between values of every member. In this sense you get exact copy. But this is how it works in general. If we look closer at this particular case - in practice, compiler will place those 2 ints w/o alignment holes between them, so likely exactly in this case you get exact bit by bit copy. – ivan.ukr Feb 13 '20 at 19:30
  • @ivan.ukr The standard allows `int` to have trap representations which could trigger when being copied by value. Therefore the standard makes all such by-value copies undefined behavior, which OP is asking about. If you just want to say that there won't be any issues in practice on current CPUs for `int`s, then I agree, but if that is all you want to say, then you should make it clearer in your post, especially which architectures you are talking about. A more relevant example for this issue in practice might be signaling NaNs for floating point types. – walnut Feb 13 '20 at 19:43
  • I am not 100% sure that this won't cause any issues in practice on any particular CPU. When something is declared undefined behavior, gcc will often make aggressive optimizations that assume the forbidden scenario will never happen, which may make surrounding code behave in unexpected ways if it does happen. – Tomek Czajka Feb 13 '20 at 20:27
  • So far, I don't know any architecture where GCC or any other compiler would have trap values for ints. If you know, please tell more about it, it will be interesting to know more about that. Standard allows maximum possibilities, and that's great, that's how it should be. But in practice some things never happen. – ivan.ukr Feb 14 '20 at 09:46
  • "When something is declared undefined behavior, gcc will often make aggressive optimizations that assume the forbidden scenario will never happen" - Maximum what compiler can do in this sense is to track the fact that some member was not initialized and just do not copy it. But this may lead to having multiple variants of the default copy constructor and breaks warranty that default copy constructor copies each member. – ivan.ukr Feb 14 '20 at 09:52
  • @ivan.ukr Your "maximum what compiler can do" in case of undefined behavior is false (or at least, unsupported). Compilers can, and will, do all kinds of things in case of undefined behavior. For some examples, see: https://blog.regehr.org/archives/213 – Tomek Czajka Feb 14 '20 at 13:48
  • 1
    @TomekCzajka: Of course, according to the authors of the Standard, UB "...identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior." There's a crazy myth that says the authors of the Standard used "Implementation-Defined Behavior" for that purpose, but such a notion is flatly contradicted by what they actually wrote. – supercat Feb 14 '20 at 23:00
  • @supercat Sure, specific compilers can define language extensions that define behaviors undefined by standard c++. GCC provides a number of language extensions (listed here https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html and here https://gcc.gnu.org/onlinedocs/gcc/C_002b_002b-Extensions.html#C_002b_002b-Extensions). But does any of these extensions permit my example code? – Tomek Czajka Feb 15 '20 at 09:15
  • @TomekCzajka Thanks, the artcile is very interesting. Still, here we are not talking here what compilers can do in general but what can be done in this particular case. – ivan.ukr Feb 15 '20 at 10:00
  • @TomekCzajka: Most implementations specify that, at least in some configurations, lvalue reads and writes will always behave as implied by the underlying storage. Under C89, storage that held "Indeterminate Value" would either hold a valid value or a trap representation; which meant that for types without trap representations, any arbitrary uninitialized storage would hold a valid (though generally not meaningful) value. Such behavior isn't generally described as an "extension", since before the publication of C99 it was simply the natural state of affairs. – supercat Feb 15 '20 at 16:26
  • 1
    @TomekCzajka: In situations where a behavior that was defined by an earlier standard becomes undefined in a later one, the intention of the Committee was generally not to deprecate the old behavior, but rather to say that *if an implementation could best serve its customers by doing something else*, the Committee didn't want to forbid them from doing so. A major point of confusion with the Standard stems from a lack of consensus among Committee members as to its intended jurisdiction. Most requirements for programs are only applicable to Strictly Conforming Programs... – supercat Feb 15 '20 at 16:33
  • ...and as such the Standard often ignores questions about what behaviors should be defined for constructs that would only be usable in programs that are Conforming but not portable (and thus not Strictly Conforming). In C89, if `Data2` had global scope, behavior of your function would be defined on all implementations, but on platforms where `int` has trap representations, a later attempt to read `Data2.b` would invoke UB (if nothing else had written it in the interim). – supercat Feb 15 '20 at 16:37
  • 1
    @TomekCzajka: I think the Standard could best fit practical reality if it were to recognize that objects whose stored value is accessed via valid pointers must behave as though stored using the defined representation, but stored values that are not accessible via pointers may use other representations that could have trap values even if the defined representations do not. This would allow for the possibility that e.g. an automatic-duration struct with two `uint16_t` values might be stored using two 32-bit registers whose values would not be initialized, and which might behave oddly... – supercat Feb 15 '20 at 16:42
  • ...if they're used without having been written first. For example, given `struct foo {uint16_t a,b;} s; uint32_t x; ... x=s.b; if (x < 65536) do_something(x);`, if `s.b` isn't initialized, even relatively conservative optimizations could allow `do_something()` to be invoked unconditionally with whatever value happened to be in the register associated with `s.b` without ensuring that the upper bits are zero. – supercat Feb 15 '20 at 16:47
  • @ivan.ukr "_So far, I don't know any architecture where GCC or any other compiler would have trap values for ints. If you know, please tell more about it..._" Sorry for being late to the party, but here is one [example](https://devblogs.microsoft.com/oldnewthing/20040119-00/?p=41003). – Super-intelligent Shade Dec 12 '21 at 03:45
  • 1
    @InnocentBystander: The phrase "trap representation" doesn't just refer to things that trigger CPU traps when accessed, but also applies to objects whose representation may violate a compiler's expected invariants in ways whose consequences may be much worse than an OS trap. For example, given `uint1 = ushort1; ... if (uint1 < 70000) foo[uint1] = 123;`, a compiler might generate code that will always make `uint1` be less than 70000 on that path, it might generate code where `uint1` might hold a value bigger than 69999 but perform the comparison and skip the assignment if it was, or it might... – supercat Dec 12 '21 at 18:44
  • 1
    ...generate code where `uint1` might hold a value bigger than 70000 but perform the assignment unconditionally if there was no defined way for `uint1` to be that big. Unfortunately, the Standard's terminology has no way of letting a compiler choose whichever of the first two approaches would be most efficient if a programmer didn't care about what value `uint1` would hold, but not allow a compiler to simultaneously generate code that might set `uint1` to a value greater than 65535 while assuming that `uint1` would never receive such a value. – supercat Dec 12 '21 at 18:52