How to allow copy elision construction for C++ classes (not just POD C structs)

Question

Consider the following code:

#include <iostream>
#include <type_traits>

struct A
{
  A() {}
  A(const A&) { std::cout << "Copy" << std::endl; }
  A(A&&) { std::cout << "Move" << std::endl; }
};

template <class T>
struct B
{
  T x;
};

#define MAKE_B(x) B<decltype(x)>{ x }

template <class T>
B<T> make_b(T&& x)
{
  return B<T> { std::forward<T>(x) };
}

int main()
{
  std::cout << "Macro make b" << std::endl;
  auto b1 = MAKE_B( A() );
  std::cout << "Non-macro make b" << std::endl;
  auto b2 = make_b( A() );
}

This outputs the following:

Macro make b
Non-macro make b
Move

Note that b1 is constructed without a move, but the construction of b2 requires a move.

I also need to type deduction, as A in real life usage may be a complex type which is difficult to write explicitly. I also need to be able to nest calls (i.e. make_c(make_b(A()))).

Is such a function possible?

Further thoughts:

N3290 Final C++0x draft page 284:

This elision of copy/move operations, called copy elision, is permitted in the following circumstances:

when a temporary class object that has not been bound to a reference (12.2) would be copied/moved to a class object with the same cv-unqualified type, the copy/move operation can be omitted by constructing the temporary object directly into the target of the omitted copy/move

Unfortunately this seems that we can't elide copies (and moves) of function parameters to function results (including constructors) as those temporaries are either bound to a reference (when passed by reference) or no longer temporaries (when passed by value). It seems the only way to elide all copies when creating a composite object is to create it as an aggregate. However, aggregates have certain restrictions, such as requiring all members be public, and no user defined constructors.

I don't think it makes sense for C++ to allow optimizations for POD C-structs aggregate construction but not allow the same optimizations for non-POD C++ class construction.

Is there any way to allow copy/move elision for non-aggregate construction?

My answer:

This construct allows for copies to be elided for non-POD types. I got this idea from David Rodríguez's answer below. It requires C++11 lambdas. In this example below I've changed make_b to take two arguments to make things less trivial. There are no calls to any move or copy constructors.

#include <iostream>
#include <type_traits>

struct A
{
  A() {}
  A(const A&) { std::cout << "Copy" << std::endl; }
  A(A&&) { std::cout << "Move" << std::endl; }
};

template <class T>
class B
{
public:
  template <class LAMBDA1, class LAMBDA2>
  B(const LAMBDA1& f1, const LAMBDA2& f2) : x1(f1()), x2(f2()) 
  { 
    std::cout 
    << "I'm a non-trivial, therefore not a POD.\n" 
    << "I also have private data members, so definitely not a POD!\n";
  }
private:
  T x1;
  T x2;
};

#define DELAY(x) [&]{ return x; }

#define MAKE_B(x1, x2) make_b(DELAY(x1), DELAY(x2))

template <class LAMBDA1, class LAMBDA2>
auto make_b(const LAMBDA1& f1, const LAMBDA2& f2) -> B<decltype(f1())>
{
  return B<decltype(f1())>( f1, f2 );
}

int main()
{
  auto b1 = MAKE_B( A(), A() );
}

If anyone knows how to achieve this more neatly I'd be quite interested to see it.

Previous discussion:

This somewhat follows on from the answers to the following questions:

Can creation of composite objects from temporaries be optimised away?
Avoiding need for #define with expression templates
Eliminating unnecessary copies when building composite objects

"...that avoids the move (which may be expensive..." - If you have move constructors that are expensive to execute, then you're doing it wrong. Typically move constructors involve nothing more than assigning pointers or handles. — In silico, May 04 '11 at 02:18
@in silico: as far as I understand, move only helps heap allocated objects, not large stack ones. If A is a std::array, move doesn't help. — Clinton, May 04 '11 at 02:19
@in @Clinton: Just before you two get your wires crossed, just let me say from a third person perspective you're both right in your own regard. When In silico says "expensive", he means compared to the copy constructor, but when Clinton say "expensive", he means in general. In other words, Clinton is right: it can't help there, but doesn't hurt (is still expensive, but not more so); and In silico is right: it shouldn't ever hurt, compared to copying. — GManNickG, May 04 '11 at 02:46
@GMan: Thanks for clarifying, that's pretty much what I was thinking when I meant "expensive". — In silico, May 04 '11 at 02:58
If you declare `make_b` as `inline` and compile with full optimizations does it still do the move? Or is the construction then elided? — edA-qa mort-ora-y, May 04 '11 at 05:04
Non-macro solution that avoids the move...? `B{ A() }`? :) Are you sure you're not over-engineering this? — AshleysBrain, May 06 '11 at 07:39
@AshleysBrain: That requires me to know the type of A, or basically do what the macro does anyway. Also, perhaps more importantly, that solution only works when A is an aggregate, which requires all members to be public and no custom constructors. — Clinton, May 09 '11 at 06:18
@Clinton: Have you tried direct initialisation of `b2`: `auto b2( make_b( A() ) )`? — Marc Mutz - mmutz, May 18 '11 at 17:48
@Marc Mutz - mmutz: Could you answer with the code that avoids "move" or "copy" being printed? (and also with your compiler version). — Clinton, May 20 '11 at 02:14
With regard to your recent edit, `B` IS C++03 POD whenever `T` is (all data members like within a single accessibility region, and no user-defined default or copy constructor or destructor). C++0x replaces POD with the concepts of "trivially copyable" and "layout compatible", and again, `B` has these characteristics whenever `T` does. — Ben Voigt, May 22 '11 at 10:52

score 10 · Accepted Answer · answered May 20 '11 at 23:16

As Anthony has already mentioned, the standard forbids copy elision from the argument of a function to the return of the same function. The rationale that drives that decision is that copy elision (and move elision) is an optimization by which two objects in the program are merged into the same memory location, that is, the copy is elided by having both objects be one. The (partial) standard quote is below, followed by a set of circumstances under which copy elision is allowed, which do not include that particular case.

So what makes that particular case different? The difference is basically that the fact that there is a function call between the original and the copied objects, and the function call implies that there are extra constraints to consider, in particular the calling convention.

Given a function T foo( T ), and a user calling T x = foo( T(param) );, in the general case, with separate compilation, the compiler will create an object $tmp1 in the location that the calling convention requires the first argument to be. It will then call the function and initialize x from the return statement. Here is the first opportunity for copy elision: by carefully placing x on the location where the returned temporary is, x and the returned object from foo become a single object, and that copy is elided. So far so good. The problem is that the calling convention in general will not have the returned object and the parameter in the same location, and because of that, $tmp1 and x cannot be a single location in memory.

Without seeing the function definition the compiler cannot possibly know that the only purpose of the argument to the function is to serve as return statement, and as such it cannot elide that extra copy. It can be argued that if the function is inline then the compiler would have the missing extra information to understand that the temporary used to call the function, the returned value and x are a single object. The problem is that that particular copy can only be elided if the code is actually inlined (not only if it is marked as inline but actually inlined) If a function call is required, then the copy cannot be elided. If the standard allowed that copy to be elided when the code is inlined, it would imply that the behavior of a program would differ due to the compiler and not user code --the inline keyword does not force inlining, it only means that multiple definitions of the same function do not represent a violation of the ODR.

Note that if the variable was created inside the function (as compared to passed into it) as in: T foo() { T tmp; ...; return tmp; } T x = foo(); then both copies can be elided: There is no restriction as of where tmp has to be created (it is not an input or output parameter to the function so the compiler is able to relocate it anywhere, including the location of the returned type, and on the calling side, x can as in the previous example be carefully located in the location of that same return statement, which basically means that tmp, the return statement and x can be a single object.

As of your particular problem, if you resort to a macro, the code is inlined, there are no restrictions on the objects and the copy can be elided. But if you add a function, you cannot elide the copy from the argument to the return statement. So just avoid it. Instead of using a template that will move the object, create a template that will construct an object:

template <typename T, typename... Args>
T create( Args... x ) {
   return T( x... );
}

And that copy can be elided by the compiler.

Note that I have not dealt with move construction, as you seem concerned on the cost of even move construction, even though I believe that you are barking at the wrong tree. Given a motivating real use case, I am quite sure that people here will come up with a couple of efficient ideas.

12.8/31

When certain criteria are met, an implementation is allowed to omit the copy/move construction of a class object, even if the copy/move constructor and/or destructor for the object have side effects. In such cases, the implementation treats the source and target of the omitted copy/move operation as simply two different ways of referring to the same object, and the destruction of that object occurs at the later of the times when the two objects would have been destroyed without the optimization.

Fred Nurk · Answer 2 · 2011-05-04T02:37:16.283

5

... but the construction of b2 requires a move.

No, it doesn't. The compiler is allowed to elide the move; whether that happens is implementation-specific, depending on several factors. It is also allowed to move, but it cannot copy (moving must be used instead of copying in this situation).

It is true that you are not guaranteed that the move will be elided. If you must be guaranteed that no move will occur, then either use the macro or investigate your implementation's options to control this behavior, particularly function inlining.

edited May 04 '11 at 02:37

answered May 04 '11 at 02:29

Fred Nurk

13,952
4
37
63

+1 for function inlining. Macros that act as functions should generally be replacable by inline [template] functions. Case at hand: `min` and `max`, thanks to `decltype`. – Matthieu M. May 04 '11 at 06:29
@Matthieu, Fred: Do you know a compiler/inline combination that will remove the move (i.e. so the output is "Macro make b, Non-macro make b")? – Clinton May 05 '11 at 00:30
@Clinton: I don't depend on 0x in production code yet, much less know specific quirks of implementations of it. Sorry. – Fred Nurk May 05 '11 at 00:42
1

@Clinton: try `-O2` so that the compiler optimizes the code. Or try returning by value to have RVO kick in. – Matthieu M. May 05 '11 at 06:18
@Fred: My reading of the standard suggests the compiler can not elided the copy/move (I've added my reasoning to the question). Do you think otherwise? – Clinton May 06 '11 at 02:44
@Clinton: I was focusing on elision of the return value from make_b (which gets constructed into b2) and missed there was another elision that matters to you. I'll take another look later. – Fred Nurk May 06 '11 at 03:01
@Fred: The elision out happens in both cases even with no optimization with gcc. Its the elision into the struct which is the issue. – Clinton May 06 '11 at 05:29

score 4 · Answer 3 · answered May 06 '11 at 10:14

4

You cannot optimize out the copy/move of the A object from the parameter of make_b to the member of the created B object.

However, this is the whole point of move semantics --- by providing a light-weight move operation for A you can avoid a potentially expensive copy. e.g. if A was actually std::vector<int>, then the copying of the vector's contents can be avoided by use of the move constructor, and instead just the housekeeping pointers will be transferred.

answered May 06 '11 at 10:14

Anthony Williams

66,628
14
133
155

You can optimise the move out of MAKE_B. Why not make_b? Move semantics doesn't help when your objects are stack allocated, i.e. std::arrays, or named parameters. I find it hard to understand why C++ would force you to use public member aggregates and macros to allow copy-elision on construction. – Clinton May 06 '11 at 23:20
Copy elision is an optimization, and is never guaranteed anyway. Yes, sometimes move semantics don't help because there isn't a cheap move operation, but in general I wouldn't worry. In a particular scenario, if you care about the performance, use a profiler. If the "redundant" copy is the bottleneck, use a macro right there. – Anthony Williams May 23 '11 at 11:18

score 1 · Answer 4 · answered May 07 '11 at 18:23

1

This isn't a big problem. All it needs is changing the structure of the code slightly.

Instead of:

B<A> create(A &&a) { ... }
int main() { auto b = create(A()); }

You can always do:

int main() { A a; B<A> b(a); ... }

If the constructor of B is like this, then it'll not take any copies:

template<class T>
class B { B(T &t) :t(t) { } T &t; };

The composite case will work too:

struct C { A a; B b; };
void init(C &c) { c.a = 10; c.b = 20; }
int main() { C c; init(c); }

And it doesn't even need c++0x features to do this.

answered May 07 '11 at 18:23

tp1

288
1
3

1

You can optimize everything by replacing "safe" constructs by more fragile ones. Storing A inside B by value is safe, storing a reference only requires far more thought to be used correctly. The point is to find a way to do it safely *and* without unnecessary overhead at the same time. – Paul Groke May 07 '11 at 18:38
well, if you want to return the value from a function, avoiding the move is impossible because the scope where the values are located in is disappearing. That's why the object needs to be originally in upper scope. There is nothing we can do to fix it, if compilers cannot somehow automatically move objects from one scope to another, – tp1 May 07 '11 at 18:51
actually, another alternative for "moving objects from scope to another" would be to redesign how c++ program's stack works. Instead of stack, you should use some tree which can not only store call stack, but also all previous call stacks which are still in use. This would fix it, but who is brave enough to implement it in their compiler? – tp1 May 07 '11 at 19:05
Returning a local object without copying it is entirely possible and implemented by most compilers (see RVO/NRVO). Even passing a returned object to another function (by value) is possible without copying it. All without inlining. However returning an argument without copying it is not - at least I wouldn't know how it could be implemented without inlining everything. – Paul Groke May 07 '11 at 19:20
But compilers can automatically move objects from one scope to another. They do it when constricting aggregates as per my example. They elide the move/copy, in gcc even with no optimization set. All I want them to do is the same on other non-aggregate constructor calls. – Clinton May 09 '11 at 04:17

How to allow copy elision construction for C++ classes (not just POD C structs)

4 Answers4

Linked