11

Consider the following piece of code:

std::vector<int> Foo() {
    std::vector<int> v = Bar();
    return v;
}

return v is O(1), since NRVO will omit the copy, constructing v directly in the storage where the function's return value would otherwise be moved or copied to. Now consider the functionally analogous code:

void Foo(std::vector<int> * to_be_filled) {
    std::vector<int> v = Bar();
    *to_be_filled = v;
}

A similar argument could be made here, as *to_be_filled = v could conceivably be compiled to an O(1) move-assign, since it's a local variable that's going out of scope (it should be easy enough for the compiler to verify that v has no external references in this case, and thus promote it to an rvalue on its last use). Is this the case? Is there a subtle reason why not?

Furthermore, it feels like this pattern can be extended to any context where an lvalue goes out of scope:

void Foo(std::vector<int> * to_be_filled) {
  if (Baz()) {
    std::vector<int> v = Bar();
    *to_be_filled = v;
  }
  ...
}

Do / can / is it useful / reasonable to expect compilers to find patterns such as the *to_be_filled = v and then automatically optimize them to assume rvalue semantics?


Edit:

g++ 7.3.0 does not perform any such optimizations in -O3 mode.

André Harder
  • 354
  • 2
  • 11
  • I'm fairly sure this optimization is not allowed by the standard. However, you could probably submit a proposal for this to be added to the standard (meaning when `v` appears as the only thing on the right hand side of an assignment where `v` is going out of scope, and `v` isn't on the left hand side), so that `v` would be moved or perhaps the copy could even be elided – Justin Jun 16 '18 at 00:28
  • Why don't you try it with a user-defined class that prints messages in the constructor and destructor, to see what gets used. – Barmar Jun 16 '18 at 00:28
  • @Justin: I've been considering submitting a proposal close to this, but would like to work out how general we can make this pattern, how hard / expensive it would be to write an optimization for it, vs how common the pattern is in the real world (i.e.: the example I gave is trivial to detect, but likely not too common). – André Harder Jun 19 '18 at 00:23
  • @Barmar: Good point. I edited question to include the result for g++. Unsuprisingly, this kind of optimization doesn't seem to be implemented, at least in -O3. – André Harder Jun 19 '18 at 00:24
  • 1
    @AndréHarder See the [std-proposals](https://groups.google.com/a/isocpp.org/forum/#!forum/std-proposals) google group. That's where I would start with a proposal (flesh out my ideas a bit, raise it there). Be aware that this is actually quite a difficult corner of the language. There are a lot of gotchas that prevent this for every case, but you may be able to propose it for some *specific* cases. – Justin Jun 19 '18 at 03:12

1 Answers1

12

The compiler is not permitted to arbitrarily decide to transform an lvalue name into an rvalue to be moved from. It can only do so where the C++ standard permits it to do so. Such as in a return statement (and only when its return <identifier>;).

*to_be_filled = v; will always perform a copy. Even if it's the last statement that can access v, it is always a copy. Compilers aren't allowed to change that.

My understanding is that return v is O(1), since NRVO will (in effect) make v into an rvalue, which then makes use of std::vector's move-constructor.

That's not how it works. NRVO would eliminate the move/copy entirely. But the ability for return <identifier>; to be an rvalue is not an "optimization". It's actually a requirement that compilers treat them as rvalues.

Compilers have a choice about copy elision. Compilers don't have a choice about what return <identifier>; does. So the above will either not move at all (if NRVO happens) or will move the object.

Is there a subtle reason why not?

One reason this isn't allowed is because the location of a statement should not arbitrarily change what that statement is doing. See, return <identifier>; will always move from the identifier (if it's a local variable). It doesn't matter where it is in the function. By virtue of being a return statement, we know that if the return is executed, nothing after it will be executed.

That's not the case for arbitrary statements. The behavior of the expression *to_be_filled = v; should not change based on where it happens to be in code. You shouldn't be able to turn a move into a copy just because you add another line to the function.

Another reason is that arbitrary statements can get really complicated really quickly. return <identifier>; is very simple; it copies/moves the identifier to the return value and returns.

By contrast, what happens if you have a reference to v, and that gets used by to_be_filled somehow. Sure that can't happen in your case, but what about other, more complex cases? The last expression could conceivably read from a reference to a moved-from object.

It's a lot harder to do that in return <identifier>; cases.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • They *could* use the as-if rule if the copy-ctor doesn't do anything interesting. It doesn't have to be observable in the debugger. – o11c Jun 16 '18 at 00:47
  • @o11c: If the copy *and* move operations are trivial (ie: not observable), then it won't actually matter; it'd do a memcpy either way. But with a non-trivial move (like `vector`), that's observable behavior. – Nicol Bolas Jun 16 '18 at 00:57
  • it's only observable if the copy/move of the *contents* of the vector are interesting. It's allowed to optimize out a new+memcpy+delete. – o11c Jun 16 '18 at 07:26
  • @o11c: The allocation is observable behavior, thanks to the `Allocator` parameter of `vector`. And if it were some `my_vector` type instead of `std::vector`, the choice would be directly observable. The compiler is not allowed to call the move constructor here. Period. – Nicol Bolas Jun 16 '18 at 13:16
  • @NicolBolas: I've fixed my wording regarding NRVO (sorry to make the quote in your answer a bit outdated -- I feel it's better than leaving the inaccuracy there and then confusing future readers). – André Harder Jun 19 '18 at 00:30
  • I like most, though don't necessarily agree with, your first point: _"The behavior of the expression `*to_be_filled = v;` should not change based on where it happens to be in code"_ . I suspect many programmers would prefer O(n) statements becoming O(1), in spite of the code and maintenance subtleties it adds. In particular, this kind of tradeoff seems ideal for a compiler optimization. – André Harder Jun 19 '18 at 01:07
  • @AndréHarder: "*I suspect many programmers would prefer O(n) statements becoming O(1)*" I suspect many programmers who would prefer that know what `std::move` is and how/when to use it ;) And those who don't know that probably ought to learn. After all, if they do change their code so that it's not the last statement, they're still going to want to know how to get their performance back, right? So it's not like you're creating a world where nobody has to `std::move` anything. – Nicol Bolas Jun 19 '18 at 01:50
  • 1
    Slight caveat to "So the above will either not move at all (if NRVO happens) or will move the object." : if the object has a copy-constructor but no move-constructor, then if NRVO doesn't happen, it will copy the object. The standard calls this "copy/move" – M.M Jun 19 '18 at 02:01
  • "But with a non-trivial move (like vector), that's observable behavior." - is it? I recall reading that at least one compiler can now optimise `vector v {1,2,3,4}; int x = v[1];` to `int x = 2;` – M.M Jun 19 '18 at 02:11
  • @NicolBolas: Fair enough. I'd also add that if this optimization were a reality, it might remove some of the incentives for people to learn to do std moves, which could be bad overall. I guess this being a good idea or not it boils down to there being a common case not solved by std::move, or a way to generalize this further. – André Harder Jun 19 '18 at 04:36