Why (if that is the case) does the standard say that copying uninitialized memory with memcpy is UB?

Question

When a class member cannot have a sensible meaning at the moment of construction, I don't initialize it. Obviously that only applies to POD types, you cannot NOT initialize an object with constructors.

The advantage of that, apart from saving CPU cycles initializing something to a value that has no meaning, is that I can detect erroneous usage of these variables with valgrind; which is not possible when I'd just give those variables some random value.

For example,

struct MathProblem {
  bool finished;
  double answer;

  MathProblem() : finished(false) { }
};

Until the math problem is solved (finished) there is no answer. It makes no sense to initialize answer in advance (to -say- zero) because that might not be the answer. answer only has a meaning after finished was set to true.

Usage of answer before it is initialized is therefore an error and perfectly OK to be UB.

However, a trivial copy of answer before it is initialized is currently ALSO UB (if I understand the standard correctly), and that doesn't make sense: the default copy and move constructor should simply be able to make a trivial copy (aka, as-if using memcpy), initialized or not: I might want to move this object into a container:

v.push_back(MathProblem());

and then work with the copy inside the container.

Is moving an object with an uninitialized, trivially copyable member indeed defined as UB by the standard? And if so, why? It doesn't seem to make sense.

In C, copying such uninitialized POD can lead to "indeterminate value: either an unspecified value or a trap representation". It that _trap representation_ that stings. — chux - Reinstate Monica, Mar 20 '21 at 08:09
@chux-ReinstateMonica What value does the `int` have when it represents a trap representation? — Carlo Wood, Mar 20 '21 at 08:12
Just because your program has undefined behaviour doesn't mean it won't work just that the standard doesn't require compilers to produce working code. I'm pretty sure any modern compiler will not produce unexpected results here — Alan Birtles, Mar 20 '21 at 08:13
`bool finished; int answer;` that's `std::optional` right here. — n. m. could be an AI, Mar 20 '21 at 08:15
When a `int` has a trap value, it does not have a numeric value. "trap representation: an object representation that need not represent a value of the object type." Trap values with `int` are rare these days. More common today with pointers, but not common overall. — chux - Reinstate Monica, Mar 20 '21 at 08:18
@chux-ReinstateMonica Ok, thanks. I changed the type of answer to double to make it more likely to have a trap representation. A double is trivially-copyable though, isn't it? Doesn't that imply that copying it with std::memcpy is OK? Now I understand that doing double x; x = y; where y has a trap representation allows the compiler to check for that trap representation and cause UB when it has. But that seems orthogonal to the statement that trivially-copyable objects can be copied with memcpy? — Carlo Wood, Mar 20 '21 at 08:25
@CarloWood Seems like you do not want to initialize class members to take advantage of code analyzers that report usage of an uninitialized data - before running the code. Yet an implementation may want to use trap values for the same reason: at run time, report uninitialized usage. Hmmm — chux - Reinstate Monica, Mar 20 '21 at 08:33
@chux-ReinstateMonica The most important reason for me is logic... I'd hate it to give something a value when not ANY value has a meaning and I just know I will not (and certainly should not) use that value (except by copying it or moving as part of the copy- or move- constructor, or assignments of the whole encapsulation object). It is the same part of my brain that causes me to write pretty much bug-free code: it objects to doing something that doesn't make sense. If the result of this discussion is that I am convinced that those variable HAVE to be initialized just so I can legally make — Carlo Wood, Mar 20 '21 at 08:56
@chux-ReinstateMonica a copy, then maybe I'll resort to introducing new "standard" macros in my code, or an inlined template function, so that I can initialize `answer` like: `answer(indeterminate())`. WIth the sole purpose of filling it with a non-trap value :/. — Carlo Wood, Mar 20 '21 at 08:59
@CarloWood Perhaps as `value` is a `double` initialize to NAN? Not a trap value, but certainly something that will "infect" subsequent calculations should they errantly use `answer` before first assignment. Good luck. — chux - Reinstate Monica, Mar 20 '21 at 14:52
Is it **important** for the purposes of this question that the class type be trivially copyable? — Davis Herring, Mar 20 '21 at 15:14
@DavisHerring The only objects that you can leave fully uninitialized (otherwise the problem just shifts to the class containing those objects) are objects that have no constructor, and those are pretty much per definition trivially copyable. Personally I think that they are POD types, but I'm not sure if there is a difference in this regard :/. — Carlo Wood, Mar 22 '21 at 08:33
@CarloWood: Nothing stops you from defining a copy constructor that checks `finished` and *assigns* answer only if it’s set. — Davis Herring, Mar 22 '21 at 15:04
@DavisHerring: A copy constructor that copies `answer` unconditionally would likely be faster than one that goes out of its way to avoid copying it when `finished` is false. — supercat, Mar 23 '21 at 17:47
@supercat: True with current compilers, but that’s a missed optimization that I find slightly surprising. Not assigning the newly created member gives it an indeterminate value which might as well be copied from the argument if that’s faster. — Davis Herring, Mar 24 '21 at 03:16
@DavisHerring: In most cases where copying a field unconditionally would be cheaper than adding code to skip the copy in cases where the field wasn't initialized, the cheapest and easiest approach whose behavior would be defined by the Standard would be to have the source-code program unconditionally initialize the original field and then unconditionally copy it, thus rendering the aforementioned optimization unnecessary. The only time the optimization would help anything would be if the programmer went out of the way to add extra logic that would slow things down... — supercat, Mar 24 '21 at 16:41
...in the absence of optimization, but then the compiler was able to eliminate that useless logic to generate machine code to perform the same sequence of steps the programmer wanted to specify in the first place. If an implementation guaranteed that copying an indeterminate value would have no side effects beyond leaving the destination holding an indeterminate value, code that exploited that guarantee could be optimized more efficiently than code which had to work around its absence since a compiler could omit copy operations in cases where it could prove they were never needed, but... — supercat, Mar 24 '21 at 16:44
...perform them unconditionally in cases where doing so would be cheaper than testing whether they were needed. — supercat, Mar 24 '21 at 16:45

eerorika · Accepted Answer · 2021-03-20T08:45:50.090

Is moving an object with an uninitialized, trivially copyable member indeed defined as UB by the standard?

Depends on the type of the member. Standard says:

[basic.indet]

When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced ([expr.ass]).

If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:

If an indeterminate value of unsigned ordinary character type ([basic.fundamental]) or std::byte type ([cstddef.syn]) is produced by the evaluation of:

the second or third operand of a conditional expression,

the right operand of a comma expression,

the operand of a cast or conversion ([conv.integral], [expr.type.conv], [expr.static.cast], [expr.cast]) to an unsigned ordinary character type or std::byte type ([cstddef.syn]), or

a discarded-value expression,

then the result of the operation is an indeterminate value.

If an indeterminate value of unsigned ordinary character type or std::byte type is produced by the evaluation of the right operand of a simple assignment operator ([expr.ass]) whose first operand is an lvalue of unsigned ordinary character type or std::byte type, an indeterminate value replaces the value of the object referred to by the left operand.

If an indeterminate value of unsigned ordinary character type is produced by the evaluation of the initialization expression when initializing an object of unsigned ordinary character type, that object is initialized to an indeterminate value. If an indeterminate value of unsigned ordinary character type or std::byte type is produced by the evaluation of the initialization expression when initializing an object of std::byte type, that object is initialized to an indeterminate value.

None of the exceptional cases apply to your example object, so UB applies.

with memcpy is UB?

It is not. std::memcpy interprets the object as an array of bytes, in which exceptional case there is no UB. You still have UB if you attempt to read the indeterminate copy (unless the exceptions above apply).

why?

The C++ standard doesn't include a rationale for most rules. This particular rule has existed since the first standard. It is slightly stricter than the related C rule which is about trap representations. To my understanding, there is no established convention for trap handling, and the authors didn't wish to restrict implementations by specifying it, and instead opted to specify it as UB. This also has the effect of allowing optimiser to deduce that indeterminate values will never be read.

I might want to move this object into a container:

Moving an uninitialised object into a container is typically a logic error. It is unclear why you might want to do such thing.

It is not an uninitialized object, it is an object that has a trivially-copyable member that was not initialized yet (aka, has an indeterminate value) and is (therefore) guaranteed not to be used; I'd say 'read' - but then I have to define that making a copy isn't reading. Imho, it isn't. Even trap values should be copyable to the same type imho without triggering a trap: duplicating data isn't USING it. The object itself will have other members that ARE initialized (otherwise it can never be known when the indeterminate member may be read). — Carlo Wood, Mar 24 '21 at 07:20
@CarloWood The rule that makes indeterminate value UB doesn't say "using the value", it says "If an indeterminate value is produced by an evaluation". — eerorika, Mar 24 '21 at 09:23
Doing: `double x, y; y = x;` is hardly an evaluation, is it? — Carlo Wood, Mar 24 '21 at 15:07
@CarloWood: In what sense would the statement `y=x;` not evaluate `x`? — supercat, Mar 24 '21 at 17:32
@supercat it makes a copy of a trivially copyable object. It should be the same as `std::memcpy(&y, &x, sizeof(double));` -- it is equivalent, so it is insane if the standard won't make an exception that this is NOT UB even when x has an indeterminate value (the `y = x` that is, the memcpy isn't I understand). — Carlo Wood, Mar 25 '21 at 16:11
@CarloWood: There are platforms where the fastest way to perform a pointer or floating-point assignment might trap for some bit values, and I think it is reasonable to say that implementations targeting such platforms may perform assignments of those types--but not of structures containing them--in such fashions, and that's how I interpret C89. C89 didn't say that all use of indeterminate values resulted in UB, but rather recognized the use of indeterminate values as being among the actions that may cause UB. — supercat, Mar 25 '21 at 16:25
@CarloWood: Note that when C89 characterized an action as UB, that meant that the action could not be regarded as 100% portable, but was not intended to imply that implementations shouldn't be expected to behave predictably in cases where doing so would obviously make sense. If it was obvious that implementations should process some constructs meaningfully in many but not all cases, but different implementations should process them meaningfully in different cases, there was no need for the Standard to expend ink enumerating all the situations that implementations should obviously support. — supercat, Mar 25 '21 at 16:31
@CarloWood: Unfortunately, no version of the C nor C++ Standard has ever made any attempt to distinguish constructs which should behave predictably on general-purpose implementations for common platforms, but might behave unpredictably on some others, from those which even general-purpose implementations for commonplace platforms should not be expected to process meaningfully, thus causing some compiler vendors to treat nonsensically many constructs which commonplace implementations were intended to process meaningfully. — supercat, Mar 25 '21 at 17:00

supercat · Answer 2 · 2021-03-23T17:52:11.410

The design of the C++ Standard was heavily influenced by the C Standard, whose authors (according to the published Rationale) intended and expected that implementations would, on a quality-of-implementation basis, extend the semantics of the language by meaningfully processing programs in cases where it was clear that doing so would be useful, even if the Standard didn't "officially" define the behavior of those programs. Consequently, both standards place more priority upon ensuring that they don't mandate behaviors in cases where doing so might make some implementations less useful, than upon ensuring that they mandate everything that should be supported by quality general-purpose implementations.

There are many cases where it may be useful for an implementation to extend the semantics of the language by guaranteeing that using memcpy on any valid region of storage will, at worst, behave in a fashion consistent with populating the destination with some possibly-meaningless bit pattern with no outside side effects, and few if any where it would be either easier or more useful to have it do something else. The only situations where anyone should care about whether the behavior of memcpy is defined in a particular situation involving valid regions of storage would be those in which some alternative behavior would be genuinely more useful than the commonplace one. If such situations exist, compiler writers and their customers would be better placed than the Committee to judge which behavior would be most useful.

As an example of a situation where an alternative behavior might be more useful, consider code which uses memcpy to copy a partially-written structure, and then uses it to make two copies of that structure. In some cases, having the compiler only write the parts of the two destination structures which had been written in the original may improve efficiency, but that behavior would be observably different from having the first memcpy behave as though it stores some bit pattern to its destination. Note that while such a change would not adversely affect a program's overall behavior if no copies of the uninitialized parts of the structure are ever used in a way that would affect behavior, the Standard has no nice way of distinguishing scenarios that could or could not occur under such a module, and thus leaves all such scenarios undefined.

Why (if that is the case) does the standard say that copying uninitialized memory with memcpy is UB?

2 Answers2