memcpy a non-POD object

Question

For objects of POD types it is guaranteed by the standard that when you memcpy the contents of your object into an array of char or unsigned char, and then memcpy the contents back into your object, the object will hold its original value.

Now note that there is no such guarantee for objects of non-POD types. So my question is why is this so?

Source of the text above

memcpy can't follow references/pointers, or call copy constructors on complex objects. All it does is take a block of contiguous memory, and copy it someplace else. — fileoffset, Oct 07 '14 at 03:16
What if I wanted shallow copy? I don't see what can an non POD type contain that can not allow byte wise copy — Kam, Oct 07 '14 at 03:17
@Kam: consider copying a `std::string` via `memcpy`. now there are two objects referring to the same buffer. each of them will try to deallocate the buffer. — Cheers and hth. - Alf, Oct 07 '14 at 03:20
@Kam: A "shallow" copy is often not valid for a non-POD type. For example, if you copied the bytes of a `std::vector` (containing a pointer to the memory it owned), then modified the original (causing it to reallocate that memory), then copied the bytes back, it would now be invalid, referring to the memory it owned before reallocation. So you'd no longer have the original "value", but something broken instead. — Mike Seymour, Oct 07 '14 at 03:21
Wow Mike, nice answer! got it! You always end up answering some of my questions. — Kam, Oct 07 '14 at 03:26
@MikeSeymour, the concept of trivially copyable class is not related to dynamic allocation (the standard does not impose any requirements about it). The reason it is copyable is because it must occupy contiguous bytes of storage. — imreal, Oct 07 '14 at 03:36
@imreal: All objects, trivial or not, occupy contiguous bytes. I was giving an example of how copying the bytes of a non-trivial type can give invalid behaviour. — Mike Seymour, Oct 07 '14 at 03:42
@MikeSeymour maybe in practice but the standard does not enforce it. Plus you could have dynamic allocation on a trivially copyable object. — imreal, Oct 07 '14 at 03:45
@MikeSeymour what you say is true for any old POD that contains a pointer e.g. `struct str { char* data; int size; }`. This is not the reason. — n. m. could be an AI, Oct 07 '14 at 03:45
@imreal: What doesn't the standard enforce? An object being in contiguous bytes? That's the very definition of an object: An *object* is a region of storage. — Mike Seymour, Oct 07 '14 at 03:51
@imreal: No thanks, this is getting far too metaphysical for me. I'll leave others to argue about whether a "region of storage" can mean anything other than a contiguous range of bytes. — Mike Seymour, Oct 07 '14 at 03:58
lolling at all the noise about contiguity. It _seems_ to me that the real reason that `string`, `vector`, _et al._ are not trivially copyable is because they have non-trivial (user-defined) constructors/destructors - which can acquire/release resources - that, if shallow-copied, will cause horror as multiple objects try to (A) access a resource the destination hasn't 'earned' and (B) release it multiple times. I might well suppose that a similar rationale was at least _part_ of the decision to disallow custom c/dtors in trivially copyable classes. — underscore_d, Jun 16 '16 at 15:47

imreal · Accepted Answer · 2014-10-07T04:10:12.100

4

The reason a trivially copyable class (C++11 mostly uses the concepts trivial class and standard-layout class instead of POD) can be memcpy'ed is not related to dynamic allocation as other answers/comments suggest. Granted, if you do try a shallow copy of a type that has dynamic allocation, you are inviting trouble. But you could very well have a type with a pointer that does dynamic allocation in a user provided constructor (as long as it has a default constructor) and qualify as trivial class.

The actual reason a memcpy can be guaranteed is that trivially copyable (and also standard-layout) types are required to occupy contiguous bytes of storage whereas other objects are not.

N3690

1.8.5 Unless it is a bit-field (9.6), a most derived object shall have a non-zero size and shall occupy one or more bytes of storage. Base class subobjects may have zero size. An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage.

edited Oct 07 '14 at 04:10

answered Oct 07 '14 at 03:50

imreal

10,178
2
32
48

@BenVoigt you're right, but trivial class simply adds 2 more requirements so it is a bit more general. – imreal Oct 07 '14 at 04:05
Nope. *trivially copyable* is more general (more types qualify). It also would save you from the mistake concerning "(as long as it has a default constructor)" – Ben Voigt Oct 07 '14 at 04:06
@BenVoigt oops you're right, more requirements mean more being more restrictive. I'll change it. – imreal Oct 07 '14 at 04:08
BTW trivial class requires a trivial default constructor. Just having a zero-argument, or even defaulted default constructor, is not enough. – Ben Voigt Oct 07 '14 at 04:29
1

A concrete example of the non-contiguous storage: Imagine a 16-bit segmented architecture where the vtable pointer is a near pointer ... memcpy'ing the object to a different segment will break the vtable pointer. – M.M Dec 21 '15 at 22:20
Do you think, then, that it's well-defined behaviour to `memcpy` a class that is _standard-layout_ but not _trivally copyable_? I'm hoping so. – underscore_d Jul 12 '16 at 10:22
@underscore_d No, there are some Standard Library types that are standard layout like *Mutex* that definitely can't be copied with `memcpy`. Since they occupy contiguous memory it is guaranteed the copied object will have the same contents but not the same behavior. – imreal Jul 12 '16 at 15:00

score 2 · Answer 2 · answered Oct 07 '14 at 03:22

I'm not sure that strictly speaking the standard does allow you to memcpy into an array of char and back again, although you'd certainly get away with it with PODs.

But things get murkier when you start looking at all the complicated things C++ can do. Consider the following:

struct ex1;

struct ex2
{
    ex2(std::unique_ptr<ex1> ptr) : member{ptr} {}
private:
    std::unique_ptr<ex1> member;
};

The struct ex2 is move-only because it has a move-only member variable. So what happens if we use memcpy to construct a bit-for-bit identical copy of an ex2 instance? We end up with two objects which both think they own the member pointer. What happens when the second one of these gets deleted? You get the idea.

A byte array doesn't own anything. The question is why you can't memcpy a non-POD to a byte array and then copy it back, not why you can't memcpy it to another non-POD and then use that other non-POD. Besides the language has no concept of ownership, it's a higher-level construct that you can apply to bith smart pointers and regular pointers (which are POD). — n. m. could be an AI, Oct 07 '14 at 04:07

score 2 · Answer 3 · answered Oct 07 '14 at 03:42

A particular case where this kind of serialization fails is on types with virtual members. If a type has a virtual member then it has a vtable. This table will contain pointers to the implementation for each of the virtual members.

If the serialized data in the char array crosses a process boundary (you send it over the network, or you write it to disk and read it back in from a different process) then the vtable pointers you wrote out may no longer be valid, and invoking any virtual member would cause undefined behavior.

If you're crossing process boundaries, then wherever you're copying is definitely not "back into your object." It's an entirely different object. That's not the subject of this question. — Rob Kennedy, Oct 20 '14 at 19:52

memcpy a non-POD object

3 Answers3

Linked