26

Sometimes it's wise to split complicated or long expressions into multiple steps, for example (the 2nd version isn't more clear, but it's just an example):

return object1(object2(object3(x)));

can be written as:

object3 a(x);
object2 b(a);
object1 c(b);
return c;

Assuming all 3 classes implement constructors that take rvalue as a parameter, the first version might be faster, because temporary objects are passed and can be moved. I'm assuming that in the 2nd version, the local variables are considered to be lvalues. But if the variables aren't later used, do C++11 compilers optimize the code so the variables are considered to be rvalues and both versions work exactly the same? I'm mostly interested in Visual Studio 2013's C++ compiler, but I'm also happy know how the GCC compiler behaves in this matter.

Thanks, Michal

Michał Fronczyk
  • 1,859
  • 4
  • 24
  • 29

4 Answers4

24

The compiler cannot break the "as-if" rule in this case. But you can use std::move to achieve the desired effect:

object3 a(x);
object2 b(std::move(a));
object1 c(std::move(b));
return c;
juanchopanza
  • 223,364
  • 34
  • 402
  • 480
  • 1
    Right, move can be used explicitly. But why would that break the rule exactly? – Michał Fronczyk Feb 17 '14 at 14:03
  • 4
    @MichałFronczyk In order to treat lvalues as rvalues, the compiler would need to select a different overload (rvalue ref vs. lvalue ref), and that would require that it knows that there will be no observable difference in picking one or the other. That sounds a) complicated and b) unlikely (but not impossible.) – juanchopanza Feb 17 '14 at 14:05
  • 4
    @juanchopanza: I'd expect the compiler at best to inline the lvalue constructors and then optimize, ignoring the existence of the rvalue constructors. Suppose the optimizer did pay attention to the rvalue ctors -- either it is able to prove they're equivalent for this code (in which case it can in principle produce code as good as the rvalue constructor from the lvalue ctor anyway without needing the rvalue ctor) or it can't (in which case it has to use the lvalue constructor anyway per the standard). So the existence of the rvalue constructors is irrelevant, I think. – Steve Jessop Feb 17 '14 at 15:03
  • @SteveJessop I would agree. The compiler would be unlikely to optimize by considering the lvalues to be rvalues. It is too complicated, and it has other tools at its disposal. – juanchopanza Feb 17 '14 at 15:06
  • 2
    @juanchopanza: there is one trick that the question suggests to me, though, which is that if the rvalue ctor exists and the compiler can prove that it's equivalent for this code, then it *might* be a good strategy to assume that it is efficient, and therefore prefer to use it rather than something else the optimizer might invent, as the basis for all further steps of optimization. I totally agree with you that it all sounds unlikely, though. Optimizers don't *typically* care much for the programmer's opinion what's efficient ;-) – Steve Jessop Feb 17 '14 at 15:09
14

As juanchopanza said, the compiler cannot (at C++ level) violate the "as-if" rule; that is all transformations should produce a semantically equivalent code.

However, beyond the C++ level, when the code is optimized, further opportunities may arise.

As such, it really depends on the objects themselves: if the move-constructors/destructors have side effects, and (de)allocating memory is a side effect, then the optimization cannot occur. If you use only PODs, with default move-constructors/destructors, then it will probably be automatically optimized.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
9

But if the variables aren't later used, do C++11 compilers optimize the code so the variables are considered to be rvalues and both versions work exactly the same?

It is possible but it greatly depends on your types. Consider the following example with a POD type point:

#include <cstdio>

struct point {
  int x;
  int y;
};

static point translate(point p, int dx, int dy) {
  return { p.x + dx, p.y + dy };
}

static point mirror(point p) {
  return { -p.x, -p.y };
}

static point make_point(int x, int y) {
  return { x, y };
}

int main() {
  point a = make_point(1, 2);
  point b = translate(a, 3, 3);
  point c = mirror(b);

  std::printf("(x,y) = (%d,%d)\n", c.x, c.y);
}

I looked at the assembly code, here is what the whole program(!) was basically compiled into (so the code below is a C approximation of the generated assembly code):

int main() {
  std::printf("(x,y) = (-4,-5)\n");
}

It not only got rid of all the local variables, it also did the computations at compile time! I have tried both gcc and clang but not msvc.

OK, so let's make the program a little more complicated so that it cannot do the computations:

int main(int argc, char* argv[]) {

  int x = *argv[1]-'0';
  int y = *argv[2]-'0';
  point a = make_point(x,y);
  point b = translate(a, 3, 3);
  point c = mirror(b);

  std::printf("(x,y) = (%d,%d)\n", c.x, c.y);
}

To run this code, you would have to call it like ./a.out 1 2.

This whole program is reduced to this one (assembly rewritten in C) after optimization:

int main(int argc, char* argv[]) {
  int x = *argv[1]-'0';
  int y = *argv[2]-'0';
  std::printf("(x,y) = (%d,%d)\n", -(x+3), -(y+3));
}

So it got rid of a, b, c and all the functions make_point(), translate() and mirror() and did as much computions as possible at compile time.

For the reasons mentioned in Matthieu M.'s answer, don't expect to happen so good optimizations with more complicated types (especially non-PODs).

In my experience, inlining is crucial. Work hard so that your functions can be easily inlined. Use link time optimizations.

Community
  • 1
  • 1
Ali
  • 56,466
  • 29
  • 168
  • 265
7

Be aware that besides move semantics that can greately speed up your code, compiler is also doing (N)RVO - (Named) Return Value Optimization, which can actually give even more efficiency to your code. I have tested your example and in g++4.8 it appears that your second example could be actually more optimal:

object3 a(x);
object2 b(a);
object1 c(b);
return c;

From my experiments it looks like it would call constructor/destructor 8 times (1 ctr + 2 copy ctrs + 1 move ctr + 4 dtrs), compared to other method that call it 10 times (1 ctr + 4 move ctors + 5 dtors). But as user2079303 has commented, move constructors should still outperform copy constructors, also in this example all calls will be inlined so no function call overhead would take place.

Copy/move elision is actually an exception to "as-if" rule, that means that sometimes you may be suprised that your constructor/destructor even tho with side effects does not get called.

http://coliru.stacked-crooked.com/a/1ca7ebec0567e48f

(you can disable (N)RVO with -fno-elide-constructors parameter)

#include <iostream>
#include <memory>

template<int S>
struct A {
    A() { std::cout<<"A::A"<<std::endl; }    
    template<int S2>
    A(const A<S2>&) { std::cout<<"A::A&"<<std::endl; }
    template<int S2>
    A(const A<S2>&&) { std::cout<<"A::A&&"<<std::endl; }    
    ~A() { std::cout<<"~A::A"<<std::endl;}        
};
A<0> foo () {    
    A<2> a; A<1> b(a); A<0> c(b); return c;   // calls dtor/ctor 8 times
    //return A<0>(A<1>(A<2>()));  // calls dtor/ctor 10 times
}
int main()
{
   A<0> a=foo();
   return 0;
}
marcinj
  • 48,511
  • 9
  • 79
  • 100
  • One thing to note is that the nested calls in this case invoke move constructor thrice, while the temporary values style invokes copy constructor twice instead. In some cases move constructor can be much faster than a copy constructor so optimality would depend on the type, wouldn't it? – eerorika Feb 17 '14 at 15:11
  • I tested your code with `std::move` like in juanchopanzas answer and that turned those two copies into moves. So in that case it would seem to be more optimal. – eerorika Feb 17 '14 at 15:41
  • 1
    One thing I have noted is that for `return A<0>(A<0>(A<0>()));` with no -std=c++11, you get only A::A and ~A::A - so RVO is hard at work here. While with move semantics you will get 10 ctr/dtr calls - so RVO is not used. – marcinj Feb 17 '14 at 15:48