Does moving non-POD C++ objects with memcpy always invoke Undefined Behavior?

Question

Specifically, I am interested in the case when:

It is known that there are no external pointers to the object (nor to any of its members).
The object contains no internal self-references.
The source object's destructor is guaranteed to not be invoked.

It would seem that under such circumstances objects should be memcpy-movable, even if they have user-defined constructors, destructors, or virtual functions. However, I am wondering if this is still considered UB, which overzealous compiler may take as an invitation to format my hard drive?

Edit: Please note that I am asking about destructive moving, not copying.

And yes, I am aware of is_trivially_copyable and others. However, is_trivially_copyable covers only a small fraction of C++ classes, whereas the situation described above is extremely common in practice.

All [trivially copyable types](http://en.cppreference.com/w/cpp/concept/TriviallyCopyable) can be copied with `memcpy`. Some trivially copyable types are not [POD types](http://en.cppreference.com/w/cpp/concept/PODType). This doesn't answer what you're really interested in. — , Mar 12 '16 at 07:27
@curiousguy: In practice, most c++ classes satisfy (2). But yes, to be sure one would have to examine the source. — user1411900, Mar 13 '16 at 01:55
1) Sometimes circular linked list don't allocate a separate "pivot" and link directly to the object. 2) The compiler could use internal pointers in polymorphic objects. — curiousguy, Mar 13 '16 at 14:06
@user1411900 I have refined my answer. Please let me know if I still haven't answered your question, or if you think I am mistaken. — Joseph Thomson, Mar 14 '16 at 10:04

Peter · Answer 1 · 2016-03-12T12:12:13.807

3

Before C++11, yes, moving a non-POD type using memcpy() would invoke undefined behaviour.

Since C++11, the definitions have been tightened, so that is not necessarily true. The following is for C++11 or later.

POD is equivalent to being both "trivial" (which essentially means "can be statically initialised") and "standard-layout" (which means a number of things, including no virtual functions, having the same access control for all non-static members, having no members which are not standard-layout, no base classes of the same type as the first non-static member, and a few other properties).

It is the "trivially copyable" property which allows an object to be copied using memcpy(), as pointed out by Joseph Thomson in comments. A "trivial" type is trivially copyable, but the reverse is not true (e.g. a class might have a non-trivial default constructor - which makes it non-trivial - but still be trivially copyable). It is also possible for a type to be trivial but not standard-layout (which means it is not POD, as a POD type has both trivial AND standard-layout properties).

The trivial property can be tested using std::is_trivial<type> or (for copying) std::is_trivially_copyable<type>. The standard-layout property can be tested using std::is_standard_layout<type>. These are declared in standard header <type_traits>.

edited Mar 12 '16 at 12:12

answered Mar 12 '16 at 11:11

Peter

35,646
4
32
74

Actually, it's the "trivially copyable" property which allows an object to be copied using `memcpy`. Using "triviality" as your criterion is overly strict. All trivial types are trivially copyable, but not all trivially copyable types are trivial. Specifically, a type needs a trivial default constructor to be trivial, but not to be copied using `memcpy`. – Joseph Thomson Mar 12 '16 at 12:01
"POD is equivalent to being both "trivial" and "standard-layout"" -- Almost, but not exactly. If a type is trivial and standard-layout, but has a non-POD data member, it itself is not a POD type. And that can happen when a type is trivial even though it contains a non-trivial data member. – Mar 12 '16 at 12:17
@Peter: Trivially copyable would imply that an object can be copied with memcpy so as to yield another object of the same type. What if the requirement is merely to copy the underlying bytes so as to capture a state that can be inspected later as a sequence of bytes, rather than an object? – supercat Mar 14 '16 at 19:32
@supercat I can't see any reason why you couldn't inspect the sequence of bytes. It's just a sequence of bytes after all. However, whether you can garner any useful information from inspecting the bytes is another question. The layout of the object in memory will be implementation defined, and certainly the standard doesn't seem to require that two equal objects have equal representations in memory. It would probably be best if you just convert back to the original type before performing any inspections. – Joseph Thomson Mar 16 '16 at 01:06
@JosephThomson: Converting back to the original type would require that it be possible to build a live object from a sequence of bytes--something that is not always possible. On the other hand, I would think that if an object contains PODS among its members, that if one captured the bitwise state of the entire object using memcpy, one should be able to "reconstitute" the PODS within it without having to reconstitute the enclosing object. – supercat Mar 16 '16 at 05:29
@supercat I would imagine that, given an object of trivially copyable type, you could copy the underlying bytes of the object into a buffer, and then recover the value of one of its data members by copying the bytes corresponding to the data member from the buffer using the `offsetof` macro. – Joseph Thomson Mar 16 '16 at 11:00
@JosephThomson: The question would be whether there would be any problem using memcpy on a non-trivially-copyable object for the purpose of later extracting trivially-copyable objects contained within it. I see no reason why using memcpy from any kind of source to a byte buffer should cause anything weird to happen, but some compilers use UB as a license to nonsensical behavior so the distinction between "Behavior which will not be terribly useful in most cases" and "UB" is huge. – supercat Mar 16 '16 at 14:43
@supercat Objects which are not of trivially copyable or standard-layout type certainly cannot have their underlying bytes copied without invoking UB, because there is no guarantee that they occupy contiguous bytes of storage. I guess the question is whether the standard allows objects of standard-layout type to be reinterpreted as an arrays of `char` or `unsigned char`. As far as I can tell, it only specifically allows this for objects of trivially copyable type. – Joseph Thomson Mar 16 '16 at 17:49
@JosephThomson: That depends what you mean by "underlying bytes". It is certainly common for objects to use outside storage to hold information which must be accessed using methods, but can a compiler do that for members declared as simple object fields? – supercat Mar 16 '16 at 18:22
@supercat Actually, I may be mistaken. The standard says that any object may be accessed as if it were a `char` or `unsigned char`, and any pointer can be `reinterpret_cast` to `char*` or `unsigned char*`. Objects of standard-layout type occupy contiguous bytes of storage, and though pointer arithmetic is only defined for array objects, the `std::vector` definition specifies that contiguous storage can be accessed like an array. Thus, objects of standard-layout type _can_ have their underlying bytes copied. However, the bytes are not guaranteed to contain the value of the object. – Joseph Thomson Mar 16 '16 at 18:53
@supercat However, if the standard-layout type had a data member of trivially copyable type, its object representation should be accessible within the array at the offset returned by `offsetof`, and this _is_ guaranteed to contain the value of the data member. Thus, you could copy an object of standard-layout type into an array, and extract the values of any trivially copyable data members at a later time. And the standard actually says nothing special about `std::memcpy`, so we can only assume it has a sensible implementation and can be used wherever we would manually copy. – Joseph Thomson Mar 16 '16 at 19:00
@JosephThomson: That would match my expectation, but the answer doesn't make that clear. Overwriting an object's storage via memcpy is bad news, and treating a pointer to an object that was copied via memcpy as a pointer to that type would be likewise, but if code uses memcpy without doing either of those things (e.g. for purposes of being able to later examine PODS parts of it). It would be helpful if the answer could definitively say either (1) such limited use of memcpy would not invoke UB, or (2) compiler writers would be within their rights to treat the memcpy as UB even when... – supercat Mar 16 '16 at 19:08
...there would be no logical reason that the Standard shouldn't have defined behavior. – supercat Mar 16 '16 at 19:09
@supercat Yeah, I will modify my answer later if I think my new reasoning is sound. Note that I still think it is UB to `std::memcpy` an object which is of neither trivially copyable nor standard-layout type. – Joseph Thomson Mar 16 '16 at 19:28
@JosephThomson: Do you think it's UB even if the destination is an array of characters, and is only used in a fashion consistent with such? I would think it would be legitimate for a compiler to use part of the storage occupied by a non-trivially-copyable object to maintain a linked list of such objects; modifying any part of the object's storage whose usage is not specified would thus invoke UB, and reading any part of the storage whose usage is not specified would yield Unspecified Values that could change at any time for any reason (each independent act of reading them should... – supercat Mar 16 '16 at 19:46
...probably be regarded as yielding an independent Unspecified Value, but some compiler writers may insist the copy would hold Indeterminate Value rather than unspecified; if code never does anything with those values other than copy them as char, however, behavior would be defined in any case). Is there any evidence that the authors of the Standard intended that copying a not-trivially-copyable object to an array of characters should invoke UB, rather than producing a bunch of bytes whose contents may not be entirely meaningful? – supercat Mar 16 '16 at 19:53
@supercat I've deleted my answer as I am pretty sure it was not correct, but I am not sure what the actual answer to the question is. As to your question, in order to copy an object with `std::memcpy`, it must occupy contiguous bytes of storage, and thus must be of either trivially copyable or standard-layout type. Iterating over non-contiguous bytes produces invalid pointers (if the compiler has strict pointer safety), and dereferencing invalid pointers is categorically undefined behaviour, even if you are copying the bytes to a contiguous array. – Joseph Thomson Mar 22 '16 at 15:59
@JosephThomson: How would one even try to iterate over non-continuous bytes? If an object isn't trivially copyable, I would expect that it could own storage which isn't reported in `sizeof`, but I would think that if `sizeof foo` yields 56, then it would own a continuous region of memory from `(char*)&foo` to `((char*)&foo)+55`, regardless of whether it also owns data elsewhere. Is that not the case? How would one find out about other bytes so as to even try to iterate over them? – supercat Mar 22 '16 at 16:38
@supercat I don't think the C++ standard guarantees that is the case. Check the example [here](http://stackoverflow.com/a/29866102/1563039). – Joseph Thomson Mar 22 '16 at 16:47
@supercat And you can't find the other bytes in any portable way. This is why you can only use `std::memcpy` on objects of trivially copyable or standard-layout type, since they are they only objects guaranteed to occupy contiguous bytes of storage. – Joseph Thomson Mar 22 '16 at 16:49
@JosephThomson: What would be the usefulness of `sizeof` on such a type? Even if it happened to represent the stride of array elements, how could code usefully employ that information? – supercat Mar 22 '16 at 18:53
@supercat I think [`alignof`](http://en.cppreference.com/w/cpp/language/alignof) gives you the stride of array elements. In that particular example, `sizeof` will still tell you the number of bytes taken up by the object representation. But the practical utility of `sizeof` for such types is limited, if you're writing standard-compliant code. Of course, you are free to write code that works on your compiler, just don't expect it to work on other compilers, or even on newer versions of yours. – Joseph Thomson Mar 22 '16 at 19:02

Rob L · Answer 2 · 2016-03-14T17:54:15.247

There is nothing undefined here. If there are virtual functions, then the vtable will get copied, too. Not a great idea, but if the types are the same it is will work.

The problem is that you need to know the details of everything in the class. Even if there are no pointers, maybe there is a unique id assigned by the constructor, or any of a thousand other things that can't just be copied. Using memcpy is like telling the compiler that you know exactly what you are doing. Make sure that's the case.

Edit: There is a big spread of possible interprtations between "not defined in the C++ standard" and "might format my hard drive with the compiler I'm using." Some clarification follows.

Classic Undefined Behavior

Here is an example of behavior that everyone would probably agree is undefined:

void do_something_undefined()
{
    int i;
    printf("%d",i);
}

Not Defined By C++ Standard

You can use a different, more strict definition of undefined. Take this code fragment:

struct MyStruct
{
    int a;
    int b;
    MyStruct() : a(1),b(2)
    {
    }
    ~MyStruct()
    {
        std::cout << "Test: Deleting MyStruct" << std::endl;
    }
};

void not_defined_by_standard()
{
    MyStruct x,y;
    x.a = 5;
    memcpy(&y, &x, sizeof(MyStruct)); // or std::memcpy
}

Taking the previous posters at their word on the standard references, this use of memcpy is not defined by the C++ standard. Perhaps it is theoretically possible that a C++ standard could add a unique ID to each non-trivially destructed class, causing the destructors of x and y to fail. Even if this is permitted by the standard, you can certainly know, for your particular compiler, if it does or does not do this.

I would make semantic difference here and call this "not defined" instead of "undefined." one problem is the lawyer-like definition of terms: "Undefined Behavior" in the C++ standard means "not defined in the standard", not "gives an undefined result when using a particular compiler." While the standard may not define it, you can absolutely know if it is undefined with your particular compiler. (Note that cppreference of std::memcpy says "If the objects are not TriviallyCopyable, the behavior of memcpy is not specified and may be undefined". This says memcpy is is unspecified behavior not undefined behavior, which is kind-of my whole point.)

So, again, you need to know exactly what you are doing. If you're writing portable code that needs to survive for years, don't do it.

Why does the C++ standard not like this code?

Simple: The memcpy call above effectively destructs and re-constructs y. It does this without calling the destructor. The C++ standard rightly does not like this at all.

The terms "unspecified behaviour" and "undefined behaviour" have very specific definitions in the C++ standard. A well-formed program can have unspecified behaviour, but not undefined behaviour. Undefined behaviour is literally that which is not defined by the standard (or the standard explicitly says is undefined behaviour). The question to ask yourself is, "Could I write a standard-compliant compiler which generates code to format my hard drive when it encounters construct X?" If the answer is "yes", then X has undefined behaviour (unless X is actually code which formats your hard drive). — Joseph Thomson, Mar 14 '16 at 19:04
Yes, the terms have specific meaning. There is definitely confusion (as evidenced by this thread as well as cppreference) on whether memcpy on non-trivially copied is undefined or unspecified behavior. That's why I tried to give a practical answer that avoided the terms. I think the MyStruct example sums it up: Is the compiler allowed to make that code format the hard drive because of my debugging destructor? Put another way, is the compiler allowed to change the memory layout of an object based on the presence of a non-virtual destructor? I would guess that the answer is "no." — Rob L, Mar 14 '16 at 20:09
Its memory layout is not the issue here. It is not trivially copyable, so the compiler is not obliged to do anything in particular if its underlying bytes are copied. I know in practice no sensible compiler would not allow it to be `memcpy`'d, but I'm talking about technicalities and over-zealous compilers, as the OP specified. — Joseph Thomson, Mar 14 '16 at 20:38
Sometimes, the standard doesn't say how something must be, but the rest of the standard is written in such a way that alternatives are not possible. I suspect that, if one delved into the standard deep enough, one would find it impossible to write a compliant compiler for which MyStruct above cannot be memcpy'd. That's why I mention the memory layout, because if the compiler is not allowed to vary the layout based on a non-virtual destructor, then memcpy must behave the same in both cases. — Rob L, Mar 14 '16 at 20:43
If the behaviour of some operation is not outlined in the standard, it by definition has undefined behaviour. And by definition, undefined behaviour can do anything (it's undefined). The moment undefined behaviour comes into play, all bets are off. Your compiler can literally do anything and not violate the standard. In this case, it doesn't matter that the layout is contiguous in memory, because the standard doesn't require that copying the underlying bytes has to result in a valid object. It doesn't require that this has to result in anything. This is the nature of undefined behaviour. — Joseph Thomson, Mar 15 '16 at 08:26

Does moving non-POD C++ objects with memcpy always invoke Undefined Behavior?

2 Answers2