Why is clang not optimizing this with NRVO?

Question

I'm trying to reason why a reasonably good C++ 11 compiler (clang) is not optimizing this code, and wondering if anybody here has opinions.

#include <iostream>
#define SLOW

struct A {
  A() {}
  ~A() { std::cout << "A d'tor\n"; }
  A(const A&) { std::cout << "A copy\n"; }
  A(A&&) { std::cout << "A move\n"; }
  A &operator =(A) { std::cout << "A copy assignment\n"; return *this; }
};

struct B {
  // Using move on a sink. 
  // Nice talk at Going Native 2013 by Sean Parent.
  B(A foo) : a_(std::move(foo)) {}  
  A a_;
};

A MakeA() {
  return A();
}

B MakeB() {  
 // The key bits are in here
#ifdef SLOW
  A a(MakeA());
  return B(a);
#else
  return B(MakeA());
#endif
}

int main() {
  std::cout << "Hello World!\n";
  B obj = MakeB();
  std::cout << &obj << "\n";
  return 0;
}

If I run this with #define SLOW commented out and optimized with -s I get

Hello World!
A move
A d'tor
0x7fff5fbff9f0
A d'tor

which is expected.

If I run this with #define SLOW enabled and optimized with -s I get:

Hello World!
A copy
A move
A d'tor
A d'tor
0x7fff5fbff9e8
A d'tor

Which obviously isn't as nice. So the question is:

Why am I not seeing a NRVO optimization applied in the "SLOW" case? I know that the compiler is not required to apply NRVO, but this would seem to be such a common simple case.

In general I try to encourage code of the "SLOW" style because I find it much easier to debug.

Optimized with `-s`? If `-s` on Clang does the same as on GCC, I don't think that's what you need. `-O2` or `-O3` would be appropriate. — jogojapan, Dec 18 '13 at 05:16
@jogojapan: although `-s` doesn't optimize, it actually doesn't matter because copy elision is _not_ an optimization: it changes the behavior while optimizations are not allowed to change the behavior. "NRVO" is a misnomer. Sane compilers apply copy elision independent of the optimization settings. Sadly, there is one popular compiler which changes behavior instead. — Dietmar Kühl, Dec 18 '13 at 05:24
@DietmarKühl: the Standard calls it an optimisation: 12.8/31 "In such cases ... without the *optimization*. This elision of copy/move operations, called copy elision...". It's an optimisation with potential side effects. Clearly, if it weren't an optimisation in terms of performance/memory usage, there'd be no reason for the Standard to allow it at all. — Tony Delroy, Dec 18 '13 at 05:39
@TonyD: good point. I guess, I should really raise a defect... The standard doesn't define the term, though, i.e., it is used informally. You are still better off not considering it an optimization as would be the compiler because it is confusing if some output shows up in debug mode but not in release mode due to copy elision (I have seen quite a few hours wasted due to this specific problem). — Dietmar Kühl, Dec 18 '13 at 05:55
@DietmarKühl: from my perspective this is unambiguously an optimisation as it improves performance, and the issue is whether this optimisation should be required to be performed either at all optimisation levels (including whatever's nominally/otherwise an unoptimised build) or none, so the behaviour doesn't change. Problem with that is portability ultimately requires mandating or forbidding the optimisation *across compilers*. I personally consider this to be a corner of C++ where the programmer has to take some responsibility and am comfortable with that. — Tony Delroy, Dec 18 '13 at 06:39
From Straustrup's definition it is certanly kind of optimization - *optimizer* - a part of a compiler that *eliminates redundant operations* from code and adjusts code to perform better on a given computer. — SChepurin, Dec 18 '13 at 06:55
You should be getting RVO for B, but of course, a is required to be copied as it's taken by value. If you had done return B(std::move(a)); you would have two moves of a but no copies.. If B had taken a by rvalue reference, just one move and no copies. — BenPope, Dec 18 '13 at 12:09

score 13 · Accepted Answer · answered Dec 18 '13 at 05:17

13

The simple answer is: because it is not allowed to apply copy elision in this case. The compiler is only allowed under very few and specific cases to apply copy elision. The quote from the standard is 12.8 [class.copy] paragraph 31:

... This elision of copy/move operations, called copy elision, is permitted in the following circumstances (which may be combined to eliminate multiple copies):

in a return statement in a function with a class return type, when the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter) with the same cv unqualified type as the function return type, the copy/move operation can be omitted by constructing the automatic object directly into the function’s return value

[...]

Clearly the type of B(a) is not A, i.e., copy elision isn't permitted. The other bullets in the same paragraph refer to things like throw expressions, eliding copies from a temporary, and exception declaration. None of these apply.

answered Dec 18 '13 at 05:17

Dietmar Kühl

150,225
13
225
380

Thanks @DietmarKühl. Still seems odd to me that this case isn't allowed to be optimized. It would seem to be a very common case, and would optimize a good chunk of C++ code. I hate the fact that these types of patterns need to be basically memorized to write efficient C++ code. – dmaclach Dec 18 '13 at 08:44
@dmaclach not really. Without memorizing the standard, `a` was not a return value in your code, so NRVO makes no sense. Elision mostly otherwise occurs when you have an unnamed temporary creating an object, and `a` has a name. So `a` won't be elided if it that has side effects. Implicit rvalue will not also not apply, as the `return` is not simple `return var;` (implicit rvalue is something you do need to learn about, admittedly) – Yakk - Adam Nevraumont Dec 18 '13 at 11:05
@Yakk: implicit rvalue is paragraph 32 of the same section basically saying the implicit rvalue applies in the same situations as copy elision extended to all local variables, i.e., arguments are also included. – Dietmar Kühl Dec 18 '13 at 11:42

score 3 · Answer 2 · answered Jan 05 '14 at 23:45

The copy that you see in the slow path is not caused by lack of RVO, but by the fact that in B(MakeA()), "MakeA()" is an rvalue, but in B(a) "a" is an lvalue.

To make this clear let's modify the slow path to indicate where MakeA() is complete:

#ifdef SLOW
  A a(MakeA());
  std::cout << "---- after call \n";
  return B(a);
#else

The output is:

Hello World!
---- after call 
A copy
A move
A d'tor
A d'tor
0x7fff5a831b28
A d'tor

Which shows that no copy was done in

A a(MakeA());

Thus, RVO did happen.

The fix, which removes all copy, is:

return B(std::move(a));

Why is clang not optimizing this with NRVO?

2 Answers2