103

I saw code somewhere in which someone decided to copy an object and subsequently move it to a data member of a class. This left me in confusion in that I thought the whole point of moving was to avoid copying. Here is the example:

struct S
{
    S(std::string str) : data(std::move(str))
    {}
};

Here are my questions:

  • Why aren't we taking an rvalue-reference to str?
  • Won't a copy be expensive, especially given something like std::string?
  • What would be the reason for the author to decide to make a copy then a move?
  • When should I do this myself?
user2030677
  • 3,448
  • 4
  • 23
  • 36
  • looks like a silly mistake to me, but I'll be interested to see if somebody with more knowledge on the subject has anything to say about it. – Dave May 23 '13 at 22:05
  • possible duplicate of [Are the days of passing const std::string & as a parameter over?](http://stackoverflow.com/questions/10231349/are-the-days-of-passing-const-stdstring-as-a-parameter-over) – Nicol Bolas May 23 '13 at 22:33
  • [This Q&A I initially forgot to link](http://stackoverflow.com/questions/15600499/how-to-pass-parameters-correctly/15600615#15600615) may also be relevant to the topic. – Andy Prowl May 24 '13 at 09:17
  • Possibly relevant : [Should I write constructors using rvalues for std::string?](http://stackoverflow.com/questions/10836221/should-i-write-constructors-using-rvalues-for-stdstring) – Bartek Banachewicz May 24 '13 at 11:38

4 Answers4

101

Before I answer your questions, one thing you seem to be getting wrong: taking by value in C++11 does not always mean copying. If an rvalue is passed, that will be moved (provided a viable move constructor exists) rather than being copied. And std::string does have a move constructor.

Unlike in C++03, in C++11 it is often idiomatic to take parameters by value, for the reasons I am going to explain below. Also see this Q&A on StackOverflow for a more general set of guidelines on how to accept parameters.

Why aren't we taking an rvalue-reference to str?

Because that would make it impossible to pass lvalues, such as in:

std::string s = "Hello";
S obj(s); // s is an lvalue, this won't compile!

If S only had a constructor that accepts rvalues, the above would not compile.

Won't a copy be expensive, especially given something like std::string?

If you pass an rvalue, that will be moved into str, and that will eventually be moved into data. No copying will be performed. If you pass an lvalue, on the other hand, that lvalue will be copied into str, and then moved into data.

So to sum it up, two moves for rvalues, one copy and one move for lvalues.

What would be the reason for the author to decide to make a copy then a move?

First of all, as I mentioned above, the first one is not always a copy; and this said, the answer is: "Because it is efficient (moves of std::string objects are cheap) and simple".

Under the assumption that moves are cheap (ignoring SSO here), they can be practically disregarded when considering the overall efficiency of this design. If we do so, we have one copy for lvalues (as we would have if we accepted an lvalue reference to const) and no copies for rvalues (while we would still have a copy if we accepted an lvalue reference to const).

This means that taking by value is as good as taking by lvalue reference to const when lvalues are provided, and better when rvalues are provided.

P.S.: To provide some context, I believe this is the Q&A the OP is referring to.

Community
  • 1
  • 1
Andy Prowl
  • 124,023
  • 23
  • 387
  • 451
  • 2
    Worth to mention it's a C++11 pattern that replaces `const T&` argument passing: in the worst case (lvalue) this is the same, but in case of a temporary you only have to move the temporary. Win-win. – syam May 23 '13 at 22:08
  • hmm, so not a mistake, but a clever way to optimise rvalues while still allowing lvalues. Does this mean the move constructor will actually be called twice, or is the compiler allowed to optimise it away to a single move? – Dave May 23 '13 at 22:09
  • Aha! So there *is* a copy being done (with lvalues in particular). Also, you didn't answer my question on whether or not a copy will be expensive (for an lvalue in this case). – user2030677 May 23 '13 at 22:10
  • 3
    @user2030677: There's no getting around that copy, unless you're storing a reference. – Benjamin Lindley May 23 '13 at 22:11
  • 5
    @user2030677: Who cares how expensive the copy is as long as you need it (and you do, if you want to hold a *copy* in your `data` member)? You would have a copy even if you would take by lvalue reference to `const` – Andy Prowl May 23 '13 at 22:13
  • *"This means that taking by value is as good as taking by lvalue reference to const when lvalues are provided"* -- not quite. There is no move required in that case. For optimal efficiency, you would provide an overload for both r-values and l-values. The appeal of this method is that it is less typing(especially if more than one parameter needs to be moved/copied), and therefore less maintenance, while still being quite efficient. – Benjamin Lindley May 23 '13 at 22:14
  • 3
    @BenjaminLindley: As a preliminary, I wrote: "*Under the assumption that moves are cheap, they can be practically disregarded when considering the overall efficiency of this design.*". So yes, there would be the overhead of a move, but that should be considered negligible unless there is proof that this is a real concern that justifies changing a simple design into something more efficient. – Andy Prowl May 23 '13 at 22:16
  • @Dave: Sorry, I somehow missed your comment. Yes, in C++11 taking by value is the idiomatic solution much more often than in C++03. And no, the elision can't be performed. There are only 4 cases when it is allowed, they are listed in paragraph 12.8/31. This is not one of them – Andy Prowl May 23 '13 at 22:24
  • Also, what if I just had `: data(str)` and no explicit `std::move`. What would happen - A copy or a move? – user2030677 May 23 '13 at 22:36
  • @user2030677: That would be a copy. – Andy Prowl May 23 '13 at 22:37
  • *You would have a copy even if you would take by lvalue reference to const* - How? When I have this code, no copy is performed -- http://coliru.stacked-crooked.com/view?id=072c2de8a490016a5c27db8150c2e206-ff683aff19d685e086e79e4ef634f9fb – user2030677 May 23 '13 at 22:59
  • 1
    @user2030677: But that is a completely different example. In the example from your question you always end up holding a copy in `data`! – Andy Prowl May 23 '13 at 23:01
  • What do you mean a copy in data? In my example the string is always being *moved* into data. Were you saying that despite sending lvalues/rvalues to the constructor taking lvalue ref to const, I will always get a copy? – user2030677 May 23 '13 at 23:15
  • 1
    @user2030677: In the example from the question you are moving `str` into data, and if an lvalue is passed to the constructor, you are copying that lvalue into `str`. So you're moving the copy. Which means you always end up holding a copy of the input when an lvalue is provided – Andy Prowl May 23 '13 at 23:17
  • But if the constructor is `S(string const &)` there will be no copies made, so what did you mean by that? – user2030677 May 23 '13 at 23:20
  • 1
    @user2030677: Yes there are. If the constructor takes by lvalue reference to `const` (`S(string const& str) : data(str)`) that will make a copy of `str` into data - even if an rvalue is passed to the constructor. If you mean that you would do this: `S(string const& str) : data(move(str))`, that would imply two things. 1) You always want to move from the input, even if that is an lvalue. Extremely dangerous for the client! Which is why 2) ... – Andy Prowl May 23 '13 at 23:24
  • 1
    ... `data(move(str))` will NOT actually move from `str`, because `str` is a reference to `const`, and the move constructor of `string` accepts an rvalue reference to non-`const`. Therefore, the copy constructor will be picked – Andy Prowl May 23 '13 at 23:24
  • A move of a std::string is not necessarily cheap. I believe with visual studio a std::string is 32 bytes when using the small value optimization. So the example is showing up to 2 moves and potentially copying 64 bytes. Only when the string exceeds the small value optimization size are the moves nearly free. – Roland May 24 '13 at 02:09
  • @Roland: Correct. I thought I mentioned it in my answer, but that was actually in the other answer I linked, which has a high degree of overlap - so I got confused. I edited. Nevertheless, moves are still O(1) while copies are O(N), so at least from the theoretical perspective, moves are still "cheap". – Andy Prowl May 24 '13 at 09:10
  • A great answer as usual @AndyProwl. Thanks for that, it helped me understand the new move facilities much better. – Marc Claesen May 24 '13 at 11:42
  • @MarcClaesen: Glad you found it helpful! – Andy Prowl May 24 '13 at 11:43
  • Wouldn't we be able to avoid the copy with lvalues by `std::move`-ing them manually to`str`? – David G May 24 '13 at 11:44
  • @0x499602D2: What example are you referring to? Sorry for asking, but with the OP's code, my code, the code in comments... I'm uncertain about what the context of your question is ;) – Andy Prowl May 24 '13 at 11:50
  • I'm referring to the OP's example, but you said there is one copy and one move with an lvalue: Can't this be avoided by`std::move`-ing the lvalue to the constructor so a copy doesn't take place? For example -- http://ideone.com/mQo6pn – David G May 24 '13 at 11:56
  • @0x499602D2: Oh, well, yes sure in that case you would not have a copy, but the input of the constructor would not be an lvalue. The result of `std::move(lvalue)` is an xvalue, which is also an rvalue. I understand what you mean though: "*starting with an lvalue, it is possible to have no copying if you explicitly `std::move()` it* - while my sentence was more like "*assuming the constructor would receive an lvalue in input, then there would be one copy and one move*" – Andy Prowl May 24 '13 at 12:00
  • "This means that taking by value is as good as taking by lvalue reference to const when lvalues are provided, and better when rvalues are provided." - for clarification, this sentence applies only to the case where the function is to store the string, as in this question. More generally, if the function only needs to read the string then this discussion doesn't apply. – M.M Jul 24 '14 at 01:24
  • @MattMcNabb: That's correct. I wrote more about the general case in [this answer](http://stackoverflow.com/questions/15600499/how-to-pass-parameters-correctly/15600615#15600615). – Andy Prowl Jul 24 '14 at 08:32
52

To understand why this is a good pattern, we should examine the alternatives, both in C++03 and in C++11.

We have the C++03 method of taking a std::string const&:

struct S
{
  std::string data; 
  S(std::string const& str) : data(str)
  {}
};

in this case, there will always be a single copy performed. If you construct from a raw C string, a std::string will be constructed, then copied again: two allocations.

There is the C++03 method of taking a reference to a std::string, then swapping it into a local std::string:

struct S
{
  std::string data; 
  S(std::string& str)
  {
    std::swap(data, str);
  }
};

that is the C++03 version of "move semantics", and swap can often be optimized to be very cheap to do (much like a move). It also should be analyzed in context:

S tmp("foo"); // illegal
std::string s("foo");
S tmp2(s); // legal

and forces you to form a non-temporary std::string, then discard it. (A temporary std::string cannot bind to a non-const reference). Only one allocation is done, however. The C++11 version would take a && and require you to call it with std::move, or with a temporary: this requires that the caller explicitly creates a copy outside of the call, and move that copy into the function or constructor.

struct S
{
  std::string data; 
  S(std::string&& str): data(std::move(str))
  {}
};

Use:

S tmp("foo"); // legal
std::string s("foo");
S tmp2(std::move(s)); // legal

Next, we can do the full C++11 version, that supports both copy and move:

struct S
{
  std::string data; 
  S(std::string const& str) : data(str) {} // lvalue const, copy
  S(std::string && str) : data(std::move(str)) {} // rvalue, move
};

We can then examine how this is used:

S tmp( "foo" ); // a temporary `std::string` is created, then moved into tmp.data

std::string bar("bar"); // bar is created
S tmp2( bar ); // bar is copied into tmp.data

std::string bar2("bar2"); // bar2 is created
S tmp3( std::move(bar2) ); // bar2 is moved into tmp.data

It is pretty clear that this 2 overload technique is at least as efficient, if not more so, than the above two C++03 styles. I'll dub this 2-overload version the "most optimal" version.

Now, we'll examine the take-by-copy version:

struct S2 {
  std::string data;
  S2( std::string arg ):data(std::move(x)) {}
};

in each of those scenarios:

S2 tmp( "foo" ); // a temporary `std::string` is created, moved into arg, then moved into S2::data

std::string bar("bar"); // bar is created
S2 tmp2( bar ); // bar is copied into arg, then moved into S2::data

std::string bar2("bar2"); // bar2 is created
S2 tmp3( std::move(bar2) ); // bar2 is moved into arg, then moved into S2::data

If you compare this side-by-side with the "most optimal" version, we do exactly one additional move! Not once do we do an extra copy.

So if we assume that move is cheap, this version gets us nearly the same performance as the most-optimal version, but 2 times less code.

And if you are taking say 2 to 10 arguments, the reduction in code is exponential -- 2x times less with 1 argument, 4x with 2, 8x with 3, 16x with 4, 1024x with 10 arguments.

Now, we can get around this via perfect forwarding and SFINAE, allowing you to write a single constructor or function template that takes 10 arguments, does SFINAE to ensure that the arguments are of appropriate types, and then moves-or-copies them into the local state as required. While this prevents the thousand fold increase in program size problem, there can still be a whole pile of functions generated from this template. (template function instantiations generate functions)

And lots of generated functions means larger executable code size, which can itself reduce performance.

For the cost of a few moves, we get shorter code and nearly the same performance, and often easier to understand code.

Now, this only works because we know, when the function (in this case, a constructor) is called, that we will be wanting a local copy of that argument. The idea is that if we know that we are going to be making a copy, we should let the caller know that we are making a copy by putting it in our argument list. They can then optimize around the fact that they are going to give us a copy (by moving into our argument, for example).

Another advantage of the 'take by value" technique is that often move constructors are noexcept. That means the functions that take by-value and move out of their argument can often be noexcept, moving any throws out of their body and into the calling scope (who can avoid it via direct construction sometimes, or construct the items and move into the argument, to control where throwing happens). Making methods nothrow is often worth it.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • I would also add if we know that we will make a copy, we should let the compiler do it, because the compiler always knows better. – Rayniery May 24 '13 at 19:34
  • 6
    Since I wrote this, another advantage was pointed out to me: often copy constructors can throw, while move constructors are often `noexcept`. By taking data by-copy, you can make your function `noexcept`, and have any copy construction caused potential throws (like out of memory) occur **outside** your function invocation. – Yakk - Adam Nevraumont Apr 16 '14 at 18:17
  • Why do you need the "lvalue non-const, copy" version in the 3 overload technique? Doesn't the "lvalue const, copy" also handle the non const case? – Bruno Martinez Apr 18 '14 at 20:30
  • @BrunoMartinez we don't! – Yakk - Adam Nevraumont Jul 24 '14 at 01:20
13

This is probably intentional and is similar to the copy and swap idiom. Basically since the string is copied before the constructor, the constructor itself is exception safe as it only swaps (moves) the temporary string str.

Joe
  • 6,497
  • 4
  • 29
  • 55
11

You don't want to repeat yourself by writing a constructor for the move and one for the copy:

S(std::string&& str) : data(std::move(str)) {}
S(const std::string& str) : data(str) {}

This is much boilerplate code, especially if you have multiple arguments. Your solution avoids that duplication on the cost of an unnecessary move. (The move operation should be quite cheap, however.)

The competing idiom is to use perfect forwarding:

template <typename T>
S(T&& str) : data(std::forward<T>(str)) {}

The template magic will choose to move or copy depending on the parameter that you pass in. It basically expands to the first version, where both constructor were written by hand. For background information, see Scott Meyer's post on universal references.

From a performance aspect, the perfect forwarding version is superior to your version as it avoids the unnecessary moves. However, one can argue that your version is easier to read and write. The possible performance impact should not matter in most situations, anyway, so it seems to be a matter of style in the end.

Philipp Claßen
  • 41,306
  • 31
  • 146
  • 239