180

How much data is copied, when returning a std::vector in a function and how big an optimization will it be to place the std::vector in free-store (on the heap) and return a pointer instead i.e. is:

std::vector *f()
{
  std::vector *result = new std::vector();
  /*
    Insert elements into result
  */
  return result;
} 

more efficient than:

std::vector f()
{
  std::vector result;
  /*
    Insert elements into result
  */
  return result;
} 

?

Andrew Truckle
  • 17,769
  • 16
  • 66
  • 164
Morten
  • 2,148
  • 2
  • 15
  • 16
  • 10
    How about passing the vector by reference and then filling it inside `f`? – Kiril Kirov Mar 29 '13 at 13:47
  • 7
    [RVO](http://en.wikipedia.org/wiki/Return_value_optimization) is a pretty basic optimization that most compiler will be capable of doing any moment. – Remus Rusanu Mar 29 '13 at 13:49
  • As answers flow in, it may help you to clarify whether you are using C++03 or C++11. The best practices between the two versions vary quite a bit. – Drew Dormann Mar 29 '13 at 13:51
  • 1
    See http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/ – hmjd Mar 29 '13 at 13:52
  • @Kiril Kirov, Can I do that with out putting it in the argument list of the function ie. void f(std::vector &result) ? – Morten Mar 29 '13 at 13:53
  • @Morten - I didn't get your question. The way to do what I suggested is what you wrote `void f(std::vector &result)`. Note, that this is a comment, not an answer. I just suggested one more method, you didn't mention. – Kiril Kirov Mar 29 '13 at 13:56
  • @Kiril Kirov, I just wanted to know if i could get the semantics of void f(std::vector &result) by declaring the result as the return value. However this have been answered below... Anyway thanks for the input. – Morten Mar 29 '13 at 14:22

9 Answers9

239

In C++11, this is the preferred way:

std::vector<X> f();

That is, return by value.

With C++11, std::vector has move-semantics, which means the local vector declared in your function will be moved on return and in some cases even the move can be elided by the compiler.

Nawaz
  • 353,942
  • 115
  • 666
  • 851
  • 6
    Will it be moved even without `std::move`? – Leonid Volnitsky Mar 29 '13 at 13:51
  • 27
    @LeonidVolnitsky: Yes if it is *local*. In fact, `return std::move(v);` will disable move-elision even it was possible with just `return v;`. So the latter is preferred. – Nawaz Mar 29 '13 at 13:52
  • It is also the preferred way before c++11. But this answer gives a different impression. – juanchopanza Mar 11 '18 at 14:00
  • 2
    @juanchopanza: I dont think so. Before C++11, you could argue against it because the vector will not be moved; and RVO is a compiler-dependent thing! Talk about the things from 80s & 90s. – Nawaz Mar 11 '18 at 16:01
  • Even so, one could argue that not returning by value is a premature optimization. – juanchopanza Mar 11 '18 at 17:52
  • @juanchopanza: I think you skipped the part where I said: talk about things/optimizations from 80s and 90s. RVO was not that popular back then. – Nawaz Mar 12 '18 at 05:07
  • 4
    My understanding about the return value (by value) is: instead of 'been moved', the return value in the callee is created on the caller's stack, so all operations in the callee are in-place, there is nothing to move in RVO. Is that correct? – r0n9 Sep 05 '18 at 05:59
  • 3
    @r0ng: Yes, that is true. That is how the compilers usually implement RVO. – Nawaz Sep 05 '18 at 06:51
  • RVO is pretty easy to understand, the more difficult case is how to obtain 'copy elision' for parameters, if for example you wanted a function to append to a vector. Ideally function outputs are always returned rather than being passed, however in the appending case it seems to be more optimal for pass a reference out param to the vector, thus guaranteeing to avoid a copy. – Medran Oct 05 '18 at 14:46
  • What about c++17? – Mayur Nov 27 '18 at 09:59
  • @Mayur: It's the same as in C++11. – Nawaz Nov 29 '18 at 05:22
  • @Mayur there are some differences: https://en.cppreference.com/w/cpp/language/copy_elision – jimifiki Mar 21 '19 at 12:37
  • 1
    @Nawaz It isn't. There is no longer even a move. – Lightness Races in Orbit Dec 11 '19 at 16:04
  • what about `struct`? – lllllllllllll Jun 23 '20 at 13:02
  • Visual Studio has a [history of not always doing RVO](https://stackoverflow.com/q/25963685). – jrh Oct 30 '20 at 13:09
  • I wonder whether the compiler could optimize this properly if the last line of f() is: `return std::vector(result.rbegin(), result.rend());` that is reverse the vector. (Granted, the algorithm could most likely be rewritten to avoid this reversal. So the question is more theoretical.) – Matyas Mar 21 '22 at 22:41
  • @Matyas: The newly created reversed vector will be moved. If that is what you meant by optimization, then yes! – Nawaz Jul 28 '22 at 22:49
119

You should return by value.

The standard has a specific feature to improve the efficiency of returning by value. It's called "copy elision", and more specifically in this case the "named return value optimization (NRVO)".

Compilers don't have to implement it, but then again compilers don't have to implement function inlining (or perform any optimization at all). But the performance of the standard libraries can be pretty poor if compilers don't optimize, and all serious compilers implement inlining and NRVO (and other optimizations).

When NRVO is applied, there will be no copying in the following code:

std::vector<int> f() {
    std::vector<int> result;
    ... populate the vector ...
    return result;
}

std::vector<int> myvec = f();

But the user might want to do this:

std::vector<int> myvec;
... some time later ...
myvec = f();

Copy elision does not prevent a copy here because it's an assignment rather than an initialization. However, you should still return by value. In C++11, the assignment is optimized by something different, called "move semantics". In C++03, the above code does cause a copy, and although in theory an optimizer might be able to avoid it, in practice its too difficult. So instead of myvec = f(), in C++03 you should write this:

std::vector<int> myvec;
... some time later ...
f().swap(myvec);

There is another option, which is to offer a more flexible interface to the user:

template <typename OutputIterator> void f(OutputIterator it) {
    ... write elements to the iterator like this ...
    *it++ = 0;
    *it++ = 1;
}

You can then also support the existing vector-based interface on top of that:

std::vector<int> f() {
    std::vector<int> result;
    f(std::back_inserter(result));
    return result;
}

This might be less efficient than your existing code, if your existing code uses reserve() in a way more complex than just a fixed amount up front. But if your existing code basically calls push_back on the vector repeatedly, then this template-based code ought to be as good.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
  • 1
    Upvoted the really best and detailed answer. However, in your swap() variant (**for C++03 without NRVO**) you still will have one copy-constructor copy made inside f(): from variable _result_ to a hidden temporary object which will be at last swapped to _myvec_. – JenyaKh Jul 06 '17 at 04:49
  • @JenyaKh: sure, that's a quality-of-implementation issue. The standard didn't require that the C++03 implementations implemented NRVO, just like it didn't require function inlining. The difference from function inlining, is that inlining doesn't change the semantics or your program whereas NRVO does. Portable code must work with or without NRVO. Optimised code for a particular implementation (and particular compiler flags) can seek guarantees regarding NRVO in the implementation's own documentation. – Steve Jessop Jul 23 '17 at 00:29
3

A common pre-C++11 idiom is to pass a reference to the object being filled.

Then there is no copying of the vector.

void f( std::vector & result )
{
  /*
    Insert elements into result
  */
} 
Drew Dormann
  • 59,987
  • 13
  • 123
  • 180
  • 5
    That is no more an idiom in C++11. – Nawaz Mar 29 '13 at 13:52
  • 1
    @Nawaz I agree. I'm not sure what the best practice is now on SO regarding questions on C++ but not specifically C++11. I suspect I should be inclined to give C++11 answers to a student, C++03 answers to someone waist-deep in production code. Do you have an opinion? – Drew Dormann Mar 29 '13 at 13:54
  • 10
    Actually, after the release of C++11 (which is 19 months old), I consider every question to be C++11 question, unless it is explicitly stated to be C++03 question. – Nawaz Mar 29 '13 at 14:02
3

It's time I post an answer about RVO, me too...

If you return an object by value, the compiler often optimizes this so it doesn't get constructed twice, since it's superfluous to construct it in the function as a temporary and then copy it. This is called return value optimization: the created object will be moved instead of being copied.

2

If the compiler supports Named Return Value Optimization (http://msdn.microsoft.com/en-us/library/ms364057(v=vs.80).aspx), you can directly return the vector provide that there is no:

  1. Different paths returning different named objects
  2. Multiple return paths (even if the same named object is returned on all paths) with EH states introduced.
  3. The named object returned is referenced in an inline asm block.

NRVO optimizes out the redundant copy constructor and destructor calls and thus improves overall performance.

There should be no real diff in your example.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
taocp
  • 23,276
  • 10
  • 49
  • 62
0
vector<string> getseq(char * db_file)

And if you want to print it on main() you should do it in a loop.

int main() {
     vector<string> str_vec = getseq(argv[1]);
     for(vector<string>::iterator it = str_vec.begin(); it != str_vec.end(); it++) {
         cout << *it << endl;
     }
}
Akash Kandpal
  • 3,126
  • 28
  • 25
-1

follow code will works without copy constructors:

your routine:

std::vector<unsigned char> foo()
{
    std::vector<unsigned char> v;
    v.resize(16, 0);

    return std::move(v); // move the vector
}

After, You can use foo routine for get the vector without copy itself:

std::vector<unsigned char>&& moved_v(foo()); // use move constructor

Result: moved_v size is 16 and it filled by [0]

-2

As nice as "return by value" might be, it's the kind of code that can lead one into error. Consider the following program:

    #include <string>
    #include <vector>
    #include <iostream>
    using namespace std;
    static std::vector<std::string> strings;
    std::vector<std::string> vecFunc(void) { return strings; };
    int main(int argc, char * argv[]){
      // set up the vector of strings to hold however
      // many strings the user provides on the command line
      for(int idx=1; (idx<argc); ++idx){
         strings.push_back(argv[idx]);
      }

      // now, iterate the strings and print them using the vector function
      // as accessor
      for(std::vector<std::string>::interator idx=vecFunc().begin(); (idx!=vecFunc().end()); ++idx){
         cout << "Addr: " << idx->c_str() << std::endl;
         cout << "Val:  " << *idx << std::endl;
      }
    return 0;
    };
  • Q: What will happen when the above is executed? A: A coredump.
  • Q: Why didn't the compiler catch the mistake? A: Because the program is syntactically, although not semantically, correct.
  • Q: What happens if you modify vecFunc() to return a reference? A: The program runs to completion and produces the expected result.
  • Q: What is the difference? A: The compiler does not have to create and manage anonymous objects. The programmer has instructed the compiler to use exactly one object for the iterator and for endpoint determination, rather than two different objects as the broken example does.

The above erroneous program will indicate no errors even if one uses the GNU g++ reporting options -Wall -Wextra -Weffc++

If you must produce a value, then the following would work in place of calling vecFunc() twice:

   std::vector<std::string> lclvec(vecFunc());
   for(std::vector<std::string>::iterator idx=lclvec.begin(); (idx!=lclvec.end()); ++idx)...

The above also produces no anonymous objects during iteration of the loop, but requires a possible copy operation (which, as some note, might be optimized away under some circumstances. But the reference method guarantees that no copy will be produced. Believing the compiler will perform RVO is no substitute for trying to build the most efficient code you can. If you can moot the need for the compiler to do RVO, you are ahead of the game.

  • 3
    This is more of an example of what can go wrong if a user is not familiar with C++ in general. Someone that is familiar with object based languages like .net or javascript would probably assume that the string vector is always passed as a pointer and therefore in your example would always point to the same object. vecfunc().begin() and vecfunc().end() will not necessarily match in your example since they should be copies of the string vector. – Medran Oct 05 '18 at 14:39
  • Moreover, one would now write the loop as `for (auto& str: vecfunc()) { ...` so there would only be one call to vecfunc() anyway, hence no error. – Glen Whitney Nov 03 '22 at 17:11
-2
   vector<string> func1() const
   {
      vector<string> parts;
      return vector<string>(parts.begin(),parts.end()) ;
   } 

This is still efficient after c++11 onwards as complier automatically uses move instead of making a copy.

Amruth A
  • 66
  • 5
  • 17