43

Say I have a class, something like the following;

class MyClass
{
public:
  MyClass();
  int a,b,c;
  double x,y,z;
};

#define  PageSize 1000000

MyClass Array1[PageSize],Array2[PageSize];

If my class has not pointers or virtual methods, is it safe to use the following?

memcpy(Array1,Array2,PageSize*sizeof(MyClass));

The reason I ask, is that I'm dealing with very large collections of paged data, as decribed here, where performance is critical, and memcpy offers significant performance advantages over iterative assignment. I suspect it should be ok, as the 'this' pointer is an implicit parameter rather than anything stored, but are there any other hidden nasties I should be aware of?

Edit:

As per sharptooths comments, the data does not include any handles or similar reference information.

As per Paul R's comment, I've profiled the code, and avoiding the copy constructor is about 4.5 times faster in this case. Part of the reason here is that my templated array class is somewhat more complex than the simplistic example given, and calls a placement 'new' when allocating memory for types that don't allow shallow copying. This effectively means that the default constructor is called as well as the copy constructor.

Second edit

It is perhaps worth pointing out that I fully accept that use of memcpy in this way is bad practice and should be avoided in general cases. The specific case in which it is being used is as part of a high performance templated array class, which includes a parameter 'AllowShallowCopying', which will invoke memcpy rather than a copy constructor. This has big performance implications for operations such as removing an element near the start of an array, and paging data in and out of secondary storage. The better theoretical solution would be to convert the class to a simple structure, but given this involves a lot of refactoring of a large code base, avoiding it is not something I'm keen to do.

Community
  • 1
  • 1
SmacL
  • 22,555
  • 12
  • 95
  • 149
  • 2
    You should be careful with interpretation of the data type being bitwise copyable. It's not only about whther it is a pointer or an integer, it's also about ownership semantics. For example, Win32 API handles are often not pointers, still bitwise copying them is usually wrong. – sharptooth Jun 11 '10 at 08:44
  • 1
    Do you actually have evidence that normal copying (iteration and copy constructor) is incurring any kind of performance penalty, or is this just a hunch ? I would urge you to *profile* this if you haven't already, as even the most naïve copy implementation should be able to max out the available DRAM bandwidth in your system. – Paul R Jun 11 '10 at 08:45
  • @sharptooth, very good point. In this case there are no handles or other references to anything dynamic, it is basically 3d coordinate data with attributes. – SmacL Jun 11 '10 at 08:50
  • @Paul R, yes I've profiled, and avoiding the copy constructor is about 4.5 times faster in this case. Part of the reason here is that my templated array class is somewhat more complex than the simplistic example given, and calls a placement 'new' when allocating memory for types that don't allow shallow copying. – SmacL Jun 11 '10 at 08:58
  • See also: http://stackoverflow.com/questions/1522789 – johnsyweb Jun 11 '10 at 09:09
  • 1
    For the simple example of the OP, my profiling showed that memcpy is only ~2 times faster (on GCC 4.4). – smerlin Jun 11 '10 at 09:25
  • 7
    _only_ 2X is still pretty significant if this is a performance-critical area. – Mr. Boy Jun 11 '10 at 10:40
  • @smerlin, interesting result. I'm on visual studio 2008, but as mentioned my case is more complex than the simple example given, with placement new and explicit delete coming into play if the array class thinks it's dealing with objects. Even 2x slower would be cause for concern for very large spatial data sets. – SmacL Jun 11 '10 at 10:40
  • Surely, if you're looking for absolute performance, then using templates is going to cause issues - you're trying to specialise template types to get performance. In which case, forget the templates and just optimise the data storage for each specific case. You'll probably get better results as you can deviate more for each specific type. It's hard to optimise without seeing the bigger picture. – Skizz Jun 11 '10 at 14:02
  • I would recommend changing the data structure to avoid this much copying, however it seems it's not a popular idea. – Matthieu M. Jun 11 '10 at 14:22
  • did some more testing, implementing a canonical copy assignment operator, memcpy was only 40% faster.. check the code here: http://codepad.org/TUpPGOLy with user-defined copy-assignment: http://codepad.org/KgVihmsu – smerlin Jun 11 '10 at 14:49
  • @smerlin - Could you change your array size from 1,000 to 1,000,000 an cut your runs from 90,000 to 90. I suspect the affect of the optimization is being lost in ratio of the number of runs to array size, but it is just a suspicion. – SmacL Jun 11 '10 at 15:15
  • did some more testing2: the more data members you have, the slower a user-defined copy assigment will perform. my tests indicated that the implicit defined copy assignment op uses memcpy internally, since it yielded exactly the same timings as a `MyClass& operator=(const MyClass& t){memcpy(this,&t,sizeof(*this));}` – smerlin Jun 11 '10 at 15:21
  • @smerlin, many thanks for the work you've put into this. Much appreciated! – SmacL Jun 11 '10 at 15:45

11 Answers11

19

According to the Standard, if no copy constructor is provided by the programmer for a class, the compiler will synthesize a constructor which exhibits default memberwise initialization. (12.8.8) However, in 12.8.1, the Standard also says,

A class object can be copied in two ways, by initialization (12.1, 8.5), including for function argument passing (5.2.2) and for function value return (6.6.3), and by assignment (5.17). Conceptually, these two operations are implemented by a copy constructor (12.1) and copy assignment operator (13.5.3).

The operative word here is "conceptually," which, according to Lippman gives compiler designers an 'out' to actually doing memberwise initialization in "trivial" (12.8.6) implicitly defined copy constructors.

In practice, then, compilers have to synthesize copy constructors for these classes that exhibit behavior as if they were doing memberwise initialization. But if the class exhibits "Bitwise Copy Semantics" (Lippman, p. 43) then the compiler does not have to synthesize a copy constructor (which would result in a function call, possibly inlined) and do bitwise copy instead. This claim is apparently backed up in the ARM, but I haven't looked this up yet.

Using a compiler to validate that something is Standard-compliant is always a bad idea, but compiling your code and viewing the resulting assembly seems to verify that the compiler is not doing memberwise initialization in a synthesized copy constructor, but doing a memcpy instead:

#include <cstdlib>

class MyClass
{
public:
    MyClass(){};
  int a,b,c;
  double x,y,z;
};

int main()
{
    MyClass c;
    MyClass d = c;

    return 0;
}

The assembly generated for MyClass d = c; is:

000000013F441048  lea         rdi,[d] 
000000013F44104D  lea         rsi,[c] 
000000013F441052  mov         ecx,28h 
000000013F441057  rep movs    byte ptr [rdi],byte ptr [rsi] 

...where 28h is the sizeof(MyClass).

This was compiled under MSVC9 in Debug mode.

EDIT:

The long and the short of this post is that:

1) So long as doing a bitwise copy will exhibit the same side effects as memberwise copy would, the Standard allows trivial implicit copy constructors to do a memcpy instead of memberwise copies.

2) Some compilers actually do memcpys instead of synthesizing a trivial copy constructor which does memberwise copies.

John Dibling
  • 99,718
  • 31
  • 186
  • 324
  • Interesting stuff. So I'm guessing even with this in place, the performance gain comes from a single 'rep movs' for the entire array as opposed to the above code in a loop. I wonder what the assembler output from other compilers would look like. It seems like a pretty obvious optimization. – SmacL Jun 11 '10 at 11:34
  • Thanks for the ARM reference @john-dibling – interestedparty333 Dec 06 '18 at 19:52
12

Let me give you an empirical answer: in our realtime app, we do this all the time, and it works just fine. This is the case in MSVC for Wintel and PowerPC and GCC for Linux and Mac, even for classes that have constructors.

I can't quote chapter and verse of the C++ standard for this, just experimental evidence.

Crashworks
  • 40,496
  • 12
  • 101
  • 170
  • Accepting this answer until I see some hard reference material to the contrary. Yes, it is bad general practice and could lead to potential future maintenance issues, but in this specific case performance requirements are paramount. – SmacL Jun 11 '10 at 10:43
  • 2
    If you know for sure exactly which compiler will be used and exactly which hardware you'll run on, generality can be less of a concern. – Crashworks Jun 11 '10 at 10:45
  • 23
    I love it when people accept answers that tell them exactly what they want to hear. –  Jun 11 '10 at 10:49
  • @Neil, the question was asked to see if there were good reasons why memcpy wouldn't work. The other answers didn't provide that, they simply gave good reasons as to why it was a bad idea, which I fully accept. Performance isn't a wooly requirement in this case, it is of primary importance. – SmacL Jun 11 '10 at 11:08
  • 1
    Agree with Neil, but in this specific case I do believe doing a `memcpy` is technically safe. That said, it is also a bad bad bad programming practice that should virtually never be done. – John Dibling Jun 11 '10 at 11:23
  • 2
    `even for classes that have constructors` breaks the POD type ( unless they are defaulted since c++11 ) of the class and yields UB. How can this anwser get so many upvotes not to mention it got accepted ? – clickMe Jan 27 '17 at 15:09
9

You could. But first ask yourself:

Why not just use the copy-constructor that is provided by your compiler to do a member-wise copy?

Are you having specific performance problems for which you need to optimise?

The current implementation contains all POD-types: what happens when somebody changes it?

johnsyweb
  • 136,902
  • 23
  • 188
  • 247
  • 1
    If you have a large array of such structures, it can be faster to use memcpy, which many compilers implement using efficient native instructions (ie, SIMD copies of whole vectors; cache operations to move entire lines at once, etc). – Crashworks Jun 11 '10 at 08:45
  • The copy constructor provided by your compiler does not do bitwise copy. –  Jun 11 '10 at 09:04
  • @Johnsyweb, as per my edits, profiling has shown this to be much slower in the case I'm dealing with. – SmacL Jun 11 '10 at 09:04
  • Updated my answer accordingly. – johnsyweb Jun 11 '10 at 09:06
  • @Neil: According to Lippman, if the class has no explicit copy ctor and the class exhibits bitwise copy semantics (as in PODs), then the compiler very well may not synthesize a copy ctor and do bitwise copy instead. This is apparently backed up in the ARM, but I haven't looked it up yet. – John Dibling Jun 11 '10 at 10:46
  • @John Neither Lippman (which bok?) nor the ARM(come on - way past obsolete!) are standards. –  Jun 11 '10 at 10:48
  • @Neil: I know, but see my post. The Lippman book I'm referring to is "Inside the C++ Object Model" – John Dibling Jun 11 '10 at 11:17
  • "Why not just use the copy-constructor that is provided by your compiler to do a member-wise copy?" As one of many examples, if the object in question contains `union`s with non-trivial ctors or `operator=`s, which generate hopeless ambiguity for a memberwise copy. What you want here is a raw bytewise copy (assuming of course that the members themselves leave the object compliant with `memcpy`). – underscore_d Apr 10 '16 at 16:49
  • @underscore_d: The object in the question contains all POD-types. This is why I posed this as a question! – johnsyweb May 06 '16 at 02:23
  • 1
    @Johnsyweb Sure, in this situation, it's probably best. I was just making a general/rhetorical observation that might apply to the same question in other situations. Probably OT and not phrased too clearly anyway... my bad! – underscore_d May 06 '16 at 09:02
9

Your class has a constructor, and so is not POD in the sense that a C struct is. It is therefore not safe to copy it with memcpy(). If you want POD data, remove the constructor. If you want non-POD data, where controlled construction is essential, don't use memcpy() - you can't have both.

  • Thanks as always for the reply Neil. Can you point me to some references that back this up. I split my class into a POD class, and the derive from it if necessary, I just can't figure out why it would be necessary. – SmacL Jun 11 '10 at 09:13
  • 1
    @ShaneMacLaughlin Search for _trivial constructor_. Yours isn't, because it's user-provided. Only trivial types are safe to copy using `memcpy`. I have no idea why those respectively absurd and tangential higher-voted answers are where they are, when the entire thread comes down to whether or not the class being bitwise-copied is trivial (previously called POD). See: http://stackoverflow.com/q/5430801/2757035 +1 to this inexplicably overlooked answer. – underscore_d Jun 16 '16 at 10:09
8

[...] but are there any other hidden nasties I should be aware of?

Yes: your code makes certain assumptions that are neither suggested nor documented (unless you specifically document them). This is a nightmare for maintenance.

Also, your implementation is basically hacking (if it's necessary it's not a bad thing) and it may depend (not sure on this) on how your current compiler implements things.

This means that if you upgrade compiler / toolchain one year (or five) from now (or just change optimization settings in your current compiler) nobody will remember this hack (unless you make a big effort to keep it visible) and you may end up with undefined behavior on your hands, and developers cursing "whoever did this" a few years down the road.

It's not that the decision is unsound, it's that it is (or will be) unexpected to maintainers.

To minimize this (unexpectedness?) I would move the class into a structure within a namespace based on the current name of the class, with no internal functions in the structure at all. Then you are making it clear you're looking at a memory block and treating it as a memory block.

Instead of:

class MyClass
{
public:
    MyClass();
    int a,b,c;
    double x,y,z;
};

#define  PageSize 1000000

MyClass Array1[PageSize],Array2[PageSize];

memcpy(Array1,Array2,PageSize*sizeof(MyClass));

You should have:

namespace MyClass // obviously not a class, 
                  // name should be changed to something meaningfull
{
    struct Data
    {
        int a,b,c;
        double x,y,z;
    };

    static const size_t PageSize = 1000000; // use static const instead of #define


    void Copy(Data* a1, Data* a2, const size_t count)
    {
        memcpy( a1, a2, count * sizeof(Data) );
    }

    // any other operations that you'd have declared within 
    // MyClass should be put here
}

MyClass::Data Array1[MyClass::PageSize],Array2[MyClass::PageSize];
MyClass::Copy( Array1, Array2, MyClass::PageSize );

This way you:

  • make it clear that MyClass::Data is a POD structure, not a class (binary they will be the same or very close - the same if I remember correctly) but this way it is also visible to programmers reading the code.

  • centralize the usage of memcpy (if you have to change to a std::copy or something else) in two years you do it in a single point.

  • keep the usage of memcpy near the implementation of the POD structure.

utnapistim
  • 26,809
  • 3
  • 46
  • 82
  • In terms of maintenance, my existing array implementation already includes a flag to enable or disable shallow copying. From a maintenance point of view, if there ever comes a time for what ever reason that the memcpy optimization is no longer viable, it is simply a matter of turning it off. – SmacL Jun 11 '10 at 14:14
  • should be `memcpy( a1, a2, count * sizeof(Data) );` – BЈовић Nov 05 '13 at 14:36
  • 2
    "you may end up with undefined behavior on your hands" - Due to the non-trivial constructor, he already **has** UB on his hands. What this really means is that said UB might _manifest detectably_ later. And just because UB isn't currently wrecking things, doesn't mean it should be used. – underscore_d Jun 16 '16 at 10:16
6

You could use memcpy for copying array of POD types. And it will be a good idea to add static assert for boost::is_pod is true. You class is not a POD type now.

Arithmetic types, enumeration types, pointer types, and pointer to member types are POD.

A cv-qualified version of a POD type is itself a POD type.

An array of POD is itself POD. A struct or union, all of whose non-static data members are POD, is itself POD if it has:

  • No user-declared constructors.
  • No private or protected non-static data members.
  • No base classes.
  • No virtual functions.
  • No non-static data members of reference type.
  • No user-defined copy assignment operator.
  • No user-defined destructor.
Kirill V. Lyadvinsky
  • 97,037
  • 24
  • 136
  • 212
3

I will notice that you admit there is an issue here. And you are aware of the potential drawbacks.

My question is one of maintenance. Do you feel confident that nobody will ever include a field in this class that would botch up your great optimization ? I don't, I'm an engineer not a prophet.

So instead of trying to improve the copy operation.... why not try to avoid it altogether ?

Would it be possible to change the data structure used for storage to stop moving elements around... or at least not that much.

For example, do you know of blist (the Python module). B+Tree can allow index access with performances quite similar to vectors (a bit slower, admittedly) for example, while minimizing the number of elements to shuffle around when inserting / removing.

Instead of going in the quick and dirty, perhaps should you focus on finding a better collection ?

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
1

Calling memcpy on non-POD classes is undefined behaviour. I suggest to follow Kirill's tip for assertion. Using memcpy can be faster, but if the copy operation is not performance critical in your code then just use bitwise copy.

zoli2k
  • 3,388
  • 4
  • 26
  • 36
  • 4
    'Either use `memcpy` or bitwise copy.' - er... what? These two terms have identical meaning. You meant _member-wise_, right? – underscore_d Jun 16 '16 at 10:12
1

When talking about the case you're referring to I suggest you declare struct's instead of class'es. It makes it a lot easier to read (and less debatable :) ) and the default access specifier is public.

Of course you can use memcpy in this case, but beware that adding other kinds of elements in the struct (like C++ classes) is not recommended (due to obvious reasons - you don't know how memcpy will influence them).

INS
  • 10,594
  • 7
  • 58
  • 89
  • Yup, I'm thinking of converting my class to a struct, which I then either embed in another class, or derive from. – SmacL Jun 11 '10 at 09:31
  • "Of course you can use memcpy in this case" Nonsense, no, you can't. UB doesn't count. The presence of a user-provided constructor means the class is not trivially copyable, and hence `memcpy`ing it is UB. – underscore_d Jun 16 '16 at 10:11
1

As pointed out by John Dibling, you should not use memcpy manually. Instead, use std::copy. If your class is memcpy-able, std::copy will automatically do a memcpy. It may be even faster than a manual memcpy.

If you use std::copy, your code is readable and it always uses the fastest way to copy. And if you change the layout of your class at a later point so that it is not memcpy-able any more, the code that uses std::copy will not break, while your manual calls to memcpy will.

Now, how do you know whether your class is memcpy-able? In the same way, std::copy detects that. It uses: std::is_trivially_copyable. You can use a static_assert to make sure that this property is upheld.

Note that std::is_trivially_copyable can only check the type information. It does not understand the semantics. The following class is trivially copyable type, but a bitwise copy would be a bug:

#include <type_traits>

struct A {
  int* p = new int[32];
};

static_assert(std::is_trivially_copyable<A>::value, "");

After a bitwise copy, the ptr of the copy will still point to the original memory. Also see the Rule of Three.

olq_plo
  • 1,002
  • 10
  • 18
0

It will work, because a (POD-) class is the same as a struct (not completely, default access ...), in C++. And you may copy a POD struct with memcpy.

The definition of POD was no virtual functions, no constructor, deconstructor no virtual inheritance ... etc.

Christopher
  • 8,912
  • 3
  • 33
  • 38
  • As you point out, if the class has a constructor, it isn't POD. And his class has a constructor. –  Jun 11 '10 at 08:46