Why are by-value parameters excluded from NRVO?

Question

Imagine:

S f(S a) {
  return a;
}

Why is it not allowed to alias a and the return value slot?

S s = f(t);
S s = t; // can't generally transform it to this :(

The spec doesn't allow this transformation if the copy constructor of S has side effects. Instead, it requires at least two copies (one from t to a, and one from a to the return value, and another from the return value to s, and only that last one can be elided. Note that I wrote = t above to represent the fact of a copy of t to f's a, the only copy which would still be mandatory in the presence of side effects of move/copy constructor).

Why is that?

Because it isn't very useful to return the parameter unchanged? — Bo Persson, May 15 '11 at 15:39
@BoP what if we change the parameter? `S f(S s) { for(E &e : s) e.toupper(); return s; }`. The compiler could do NRVO, and ignore the `return s`, because the return value is already in place. One copy/move less! — Johannes Schaub - litb, May 15 '11 at 15:42
Hmm, maybe it has to do with calling conventions? The caller would have to know in what outgoing argument slot the return value is to be found. That's different for the other allowed forms of NRVO, it seems. I would like to get a nice answer explaining it :) — Johannes Schaub - litb, May 15 '11 at 15:54
Are you looking for a quote from the standard or for a good explanation? — fredoverflow, May 15 '11 at 15:56
@Fred I don't thin that the standard explains that. If you have a quote that explains it better than you, then of course you don't need an own explanation. If not, I always welcome good explanation of the rationale. — Johannes Schaub - litb, May 15 '11 at 16:00
@Johannes: does it mean that you would get one less copy in case the code is inlined (since the copy was not needed) or does the behavior remains (ie, a copy is made) ? — Matthieu M., May 15 '11 at 16:09
@Matthieu it doesn't matter whether or not the code is inlined. the copy has always to be done if it has side effects and NRVO cannot apply (observable side effects, that is). I would get one less copy if NRVO could apply in my case, I think. — Johannes Schaub - litb, May 15 '11 at 16:13
@Johannes: I admit that the consideration for observable side effects mesmerizes me. NRVO normally applies whether or not the copy/move constructor have side effects. I can understand the calling convention, but I see no reason why this should figure in the standard. — Matthieu M., May 15 '11 at 16:17
@Matthieu indeed, that's a good point. Why doesn't the spec allow it anyway? Then, if the code is inlined, it can recombine the inlined body with the caller's code and eliminate the copy even if it would have side effects. — Johannes Schaub - litb, May 15 '11 at 16:43
@Bo, @Neil: surely the question isn't, "would this generally be a good/easy optimization?", the questions is, "why does the standard contain additional text, just to forbid this optimization?". There must have been a positive reason to forbid it, "it's not worth making" only explains the situation if it's just a matter of the standard omitting to allow it. — Steve Jessop, May 15 '11 at 16:44
@Steve Does the standard forbid it? My impression is that it simply doesn't explicitly allow it. — , May 15 '11 at 16:50
@Neil I think that until after a few months back, this wasn't explicitly forbidden in the spec, but it was only known to the expert committee members that they are not allowed to do it. Recent C++0x drafts explicitly forbid it. See http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1148 — Johannes Schaub - litb, May 15 '11 at 16:54
@Neil: sorry, I'm saying "the standard" when I mean "the FDIS", not the current standard. Unless some odd coincidence is going on, this question is related to http://stackoverflow.com/questions/6009004/are-value-parameters-implicitly-moved-when-returned-by-value — Steve Jessop, May 15 '11 at 16:58
@Johannes: I must admit I don't understand how this change satisfies the goal stated "It is unclear whether copy elision is permitted when returning a parameter of class type. If not, it should still be possible to move, rather than copy, the return value." — Matthieu M., May 15 '11 at 18:04
@Johannes: I don't see how forbidding copy elision suddenly make it possible to move, **rather than copy**, the return value. It seemed to me that, if possible, elision should be preferred to move (a no-op is always faster than anything). But then I am just back from a whole-day hike so my mind may be a little slow :) — Matthieu M., May 15 '11 at 18:20
@Matthieu the specification wasn't clear as to whether a return of a by-value parameter can be copy-elided. So [DE11](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3296.html#DE11) asked for clarification. The specification basically says "If you are allowed to elide a copy, you always have to treat the expression as an rvalue if you cannot elide it.". So if they want to forbid to elide a copy, but still want to automatically move, they have to tweak the wording not to only rely on copy-elision anymore. That's what they did. Automatic move was required before that change too. — Johannes Schaub - litb, May 15 '11 at 18:23
@Johannes: Thanks for the clarification :) Sorry it didn't help solving the issue at hand though. — Matthieu M., May 15 '11 at 18:25

Nicol Bolas · Answer 1 · 2012-03-31T18:49:19.403

Here's why copy elision doesn't make sense for parameters. It's really about the implementation of the concept at the compiler level.

Copy elision works by essentially constructing the return value in-place. The value isn't copied out; it's created directly in its intended destination. It's the caller who provides the space for the intended output, and thus it's ultimately the caller who provides the possibility for the elision.

All that the function internally needs to do in order to elide the copy is construct the output in the place provided by the caller. If the function can do this, you get copy elision. If the function can't, then it will use one or more temporary variables to store the intermediate results, then copy/move this into the place provided by the caller. It's still constructed in-place, but the construction of the output happens via copy.

So the world outside of a particular function doesn't have to know or care about whether a function does elision. Specifically, the caller of the function doesn't have to know about how the function is implemented. It's not doing anything different; it's the function itself that decides if elision is possible.

Storage for value parameters is also provided by the caller. When you call f(t), it is the caller that creates the copy of t and passes it to f. Similarly, if S is implicitly constructable from an int, then f(5) will construct an S from the 5 and pass it to f.

This is all done by the caller. The callee doesn't know or care that it was a variable or a temporary; it's just given a spot of stack memory (or registers or whatever).

Now remember: copy elision works because the function being called constructs the variable directly into the output location. So if you're trying to elide the return from a value parameter, then the storage for the value parameter must also be the output storage itself. But remember: it is the caller that provides that storage for both the parameter and the output. And therefore, to elide the output copy, the caller must construct the parameter directly into the output.

To do this, now the caller needs to know that the function it's calling will elide the return value, because it can only stick the parameter directly into the output if the parameter will be returned. That's not going to generally be possible at the compiler level, because the caller doesn't necessarily have the implementation of the function. If the function is inlined, then maybe it can work. But otherwise no.

Therefore, the C++ committee didn't bother to allow for the possibility.

Between C++03 and C++11, the committee changed "the expression is the name of a non-volatile automatic object" to "the expression is the name of a non-volatile automatic object (other than a function or catch-clause parameter)". So it is not the case that the committee "didn't bother to allow for the possibility". It was permitted in C++03 (perhaps accidentally), and then the committee went out of its way to ban it in C++11. — Steve Jessop, May 12 '12 at 18:08
What @SteveJessop said. I can't believe Nicol's answer has gotten so many upvotes; it's blatantly incorrect. — Quuxplusone, Dec 17 '13 at 09:50
Looks correct retroactively ;) Did any compilers implement the possibly accidental allowance for NRVO from arguments? I might _assume_ not, for the convincing reasons Nicol gave. And after C++11 amending the wording (albeit with no real discussion), they're not allowed to do so. So in practical terms, maybe per the _intention_ of C++ <11, and as of now - this looks like a correct answer. I'm sure with advanced LTO etc, compilers _could_ perform copy-elision in this case, but it seems the Committee didn't want to think through making that a well-defined thing. — underscore_d, Aug 07 '16 at 16:53
This is not entirely correct. Copy elision is about avoiding copy initialization of various objects, which has nothing to do with the "return value". In the language, there is no requirement to enforce the copy initialization being at the caller site. — FrankHB, Oct 14 '20 at 18:14
For the contemporary implementations, the key point is that elision of a parameter and the returned object can't coexist in cases the implementation of the function is invisible, particularly across different TUs without the whole program analysis. Code of such functions needs to obey ABI rules in general, which may in turn require the caller to do the copy. (A specialized ABI can still allow a different way.) — FrankHB, Oct 14 '20 at 18:28
The language standard is ABI-agnostic, so it can't make this a requirement or a guarantee, though. Allowing elision in such cases is still technically feasible, but also somewhat confusing: consider how to make an explicit copy of the object idiomatically in a portable program. It seems that the committee just didn't bother to do the tradeoff. — FrankHB, Oct 14 '20 at 18:30

score 3 · Answer 2 · answered May 15 '11 at 23:24

3

The rationale, as I understand it, for that restriction is that the calling convention might (and will in many cases) demand that the argument to the function and the return object are at different locations (either memory or registers). Consider the following modified example:

X foo();
X bar( X a ) 
{ 
   return a;
}
int main() {
   X x = bar( foo() );
}

In theory the whole set of copies would be return statement in foo ($tmp1), argument a of bar, return statement of bar ($tmp2) and x in main. Compilers can elide two of the four objects by creating $tmp1 at the location of a and $tmp2 at the location of x. When the compiler is processing main it can note that the return value of foo is the argument to bar and can make them coincide, at that point it cannot possibly know (without inlining) that the argument and return of bar are the same object, and it has to comply with the calling convention, so it will place $tmp1 in the position of the argument to bar.

At the same time, it knows that the purpose of $tmp2 is only creating x, so it can place both at the same address. Inside bar, there is not much that can be done: the argument a is located in place of the first argument, according to the calling convention, and $tmp2 has to be located according to the calling convention, (in the general case in a different location, think that the example can be extended to a bar that takes more arguments, only one of which is used as return statement.

Now, if the compiler performs inlining it could detect that the extra copy that would be required if the function was not inlined is really not needed, and it would have a chance for eliding it. If the standard would allow for that particular copy to be elided, then the same code would have different behaviors depending on whether the function is inlined or not.

answered May 15 '11 at 23:24

David Rodríguez - dribeas

204,818
23
294
489

2

That's a reason not to do it sometimes in some situations, not a reason to flat out disallow it. The same code already has different behaviours depending on copy ellision. – Puppy May 17 '11 at 10:45
@DeadMG: I don't quite follow the argument, if that was not disallowed, a compliant compiler would produce two different results for basically the same code based on actual inlining (not on the `inline` identifier being present, but real code inlining). – David Rodríguez - dribeas May 17 '11 at 13:46
So all your reply boils down to need for expected side-effects in act of making additional unneeded copies? Basically we may lose oh-so-desired side effects of copy constructors and destructors of all these temporaries that are not called because of compiler optimized them out? For me a program whose correctness depends on such things is extremely hard to imagine as maintainable, robust or even readable. – Öö Tiib May 28 '11 at 22:11
@Öö Tiib: I am not sure I follow your argument, whenever the compiler is able to optimize away a copy, what it does is remove the existence of one of the objects by having both be aliases of a single object, but in doing so the number of constructors/destructors executed is evened, that is, for each call to a constructor, a destructor is called. Each acquired resource is released. If the constructor/destructor has extra side effects (besides construction/destruction of the object) those may differ. Consider a class that counts the number of instances created, that value will differ. – David Rodríguez - dribeas May 29 '11 at 22:59
1

`struct test { static int created; static int destroyed; test() { ++created; } test( test const & ) { ++created; } ~test() { ++destroyed; } }; int test::created = 0; int test::destroyed = 0;` The final numbers for `test::created` and `test::destroyed` may differ, but that is something that has been so for a very long time already. The current standard does allow for copy elision, which would have that exact same problems. That is, programmers *must* know where and when copies might be elided, and understand what that means and how that can affect the semantics of your program. – David Rodríguez - dribeas May 29 '11 at 23:05
1

Ok, but so currently we already have optional copy elision and so such counts may differ already per platform and also per function if in one function compiler found it profitable to elide copies and in other it did not. So majority of us does not likely care if some more unneeded copies are further elided, for example the copy from function parameter to return value. – Öö Tiib May 30 '11 at 22:39
@Öö Tiib: In the current standard, for a given compiled program, the copy is either done or elided. If you allow copy elision from a function argument to the function return for actually inlined functions, then the same function in the same program might elide the copy or not --a function can be inlined and not inlined in different places in the same program. Does that matter enough? I guess it does for the people working in the standard. – David Rodríguez - dribeas May 31 '11 at 07:23
1

@DavidRodríguez-dribeas — I believe the point you're missing (or at least *were* missing, two years ago) is that C++ *has always* allowed a conforming implementation to inline the same function in multiple places, and perform optional copy elision differently in the different places. (Or any other unspecified behavior, for that matter.) The situation you were so afraid of is in fact totally legal and already happens today. So we're still left with Johannes' original question: what scared the Danish enough to open [DR 1148](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1148)? – Quuxplusone Dec 17 '13 at 10:40

score 2 · Answer 3 · edited May 23 '17 at 12:16

David Rodríguez - dribeas answer to my question 'How to allow copy elision construction for C++ classes' gave me the following idea. The trick is to use lambdas to delay evaluation til inside the function body:

#include <iostream>

struct S
{
  S() {}
  S(const S&) { std::cout << "Copy" << std::endl; }
  S(S&&) { std::cout << "Move" << std::endl; }
};

S f1(S a) {
  return a;
}

S f2(const S& a) {
  return a;
}

#define DELAY(x) [&]{ return x; }

template <class F>
S f3(const F& a) {
  return a();
}

int main()
{
  S t;
  std::cout << "Without delay:" << std::endl;
  S s1 = f1(t);
  std::cout << "With delay:" << std::endl;
  S s2 = f3(DELAY(t));
  std::cout << "Without delay pass by ref:" << std::endl;
  S s3 = f2(t);
  std::cout << "Without delay pass by ref (temporary) (should have 0 copies, will get 1):" << std::endl;
  S s4 = f2(S());
  std::cout << "With delay (temporary) (no copies, best):" << std::endl;
  S s5 = f3(DELAY(S()));
}

This outputs on ideone GCC 4.5.1:

Without delay:
Copy
Copy
With delay:
Copy

Now this is good, but one could suggest that the DELAY version is just like passing by const reference, as below:

Without delay pass by ref:
Copy

But if we pass a temporary by const reference, we still get a copy:

Without delay pass by ref (temporary) (should have 0 copies, will get 1):
Copy

Where the delayed version elides the copy:

With delay (temporary) (no copies, best):

As you can see, this elides all copies in the temporary case.

The delayed version produces one copy in the non-temporary case, and no copies in the case of a temporary. I don't know any way to achieve this other than lambdas, but I'd be interested if there is.

score 1 · Answer 4 · answered May 15 '11 at 16:47

1

From t to a it is unreasonable to elide copy. The parameter is declared mutable, so copying is done because it is expected to be modified in function.

From a to return value i can not see any reasons to copy. Perhaps it is some sort of oversight? The by-value parameters feel like locals inside function body ... i see no difference there.

answered May 15 '11 at 16:47

Öö Tiib

10,809
25
44

1

Not an oversight, I think. C++03 doesn't make a special case of function parameters (so I think the elision is allowed in C++03, perhaps unintentionally). The C++0x FDIS adds "other than a function or catch-clause parameter") to the text permitting NRVO. – Steve Jessop May 15 '11 at 17:02
Then i see no other reason but some members of committee whose legacy libraries pass by const reference and then copy. So they decided to encourage that pattern. – Öö Tiib May 15 '11 at 17:13
For everyone interested: I asked in the #llvm channel, and they said that it's probably noone thought of that optimization. I wonder how clang/gcc cope on delete/new calls in copy ctors of typical C++03 classes though. Whether they can optimize the copy away knowing they aren't observable side effects. Otherwise, having the spec not forbid it would be nice, if there aren't any problems, I think! – Johannes Schaub - litb May 15 '11 at 20:04
@JohannesSchaub-litb It's not true that "noone thought of that optimization". Someone in the Danish national body *did* notice that the optimization was permitted, and went out of their way to *disallow* in C++11 what had previously been *allowed* in C++03. (source: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1148 ) So we need a better answer than *that*. There must be some reason that the Danish body believed this optimization was too dangerous to permit. (Tinfoil hat time, but Öö Tiib might be on to something with his suggestion above...) – Quuxplusone Dec 17 '13 at 10:02

score 0 · Answer 5 · answered Mar 10 '12 at 06:53

0

I feel, because the alternative is always available for the optimization:

S& f(S& a) { return a; }  // pass & return by reference
^^^  ^^^

If f() is coded as mentioned in your example, then it's perfectly alright to assume that copy is intended or side effects are expected; otherwise why not to choose the pass/return by reference ?

Suppose if NRVO applies (as you ask) then there is no difference between S f(S) and S& f(S&)!

NRVO kicks in the situations like operator +() (example) because there is no worthy alternative.

One supporting aspect, all below function have different behaviors for copying:

S& f(S& a) { return a; }  // 0 copy
S f(S& a) { return a; } // 1 copy
S f(S a) { A a1; return (...)? a : a1; }  // 2 copies

In the 3rd snippet, if the (...) is known at compile time to be false then compiler generates only 1 copy.
This means, that compiler purposefully doesn't perform optimization when a trivial alternative is available.

answered Mar 10 '12 at 06:53

iammilind

68,093
33
169
336

@MartinBa, `const S& f(const S& a);` will work for temporaries as well. – iammilind Nov 15 '12 at 11:24
Yeah, the `const&` version will OC work for temps as well, but that's not what you present in your answer. (And the arg be const seems really useless) – Martin Ba Nov 15 '12 at 20:10
@MartinBa, that's what my argument. We don't need another optimization as asked by OP. There are several alternatives already available, some are useful some are useless. – iammilind Nov 16 '12 at 04:20
1

@iammilind's proposed workaround is ridiculously inefficient in the case where `S` is a template type parameter with the type `int`. One of the great advantages of C++ is that we usually don't have to write different code for primitive types versus user-defined types; let's not throw that advantage away if we don't have to. **Besides,** the question wasn't "how do I work around this issue", it was "why does this issue exist in the first place". – Quuxplusone Dec 17 '13 at 09:54
"_Suppose if NRVO applies (as you ask) then there is no difference between `S f(S)` and `S& f(S&)`!_" This is wrong. This implies the caller who passed in an `S` by value would see it being modified. But that isn't how it would work. The argument by value is always copied, so if RVO were applied, it would be applied to the copy that gets returned, not the instance that was passed in. The modifications done by the function would only be visible in its returned value, not whatever instance the caller had passed in. So, this answer not only doesn't answer what was asked, but its tangent is wrong. – underscore_d Sep 23 '18 at 19:34
Even with a `const` qualified parameter, the workaround is not valid if the argument is expected not odr-used. (In C++11/14-era, this is especially troublesome with `constexpr` objects declared as `static` class members. Ironically, the requirements on `constexpr` functions may reduce the pain.) – FrankHB Oct 14 '20 at 18:37

score -2 · Answer 6 · answered Mar 05 '12 at 22:39

-2

I think the issue is that if the copy constructor does something, then the compiler must do that thing a predictable number of times. If you have a class that increments a counter every time it's copied, for example, and there's a way to access that counter, then a standards-compliant compiler must do that operation a well-defined number of times (otherwise, how would one write unit tests?)

Now, it's probably a bad idea to actually write a class like that, but it's not the compiler's job to figure that out, only to make sure that the output is correct and consistent.

answered Mar 05 '12 at 22:39

bdow

181
4

3

Copy elision is specifically allowed in certain circumstances, regardless of whether the copy constructor has side effects. – ildjarn Mar 05 '12 at 22:40
Ok, that's true. But I think those are circumstances where the sequence of events is not guaranteed anyway. – bdow Mar 05 '12 at 22:51
2

My point is, your rationale makes no sense because there are _already_ legal circumstances where the copy constructor will be skipped. – ildjarn Mar 05 '12 at 22:53

Why are by-value parameters excluded from NRVO?

6 Answers6

Linked

Related