111

Short version: It's common to return large objects—such as vectors/arrays—in many programming languages. Is this style now acceptable in C++0x if the class has a move constructor, or do C++ programmers consider it weird/ugly/abomination?

Long version: In C++0x is this still considered bad form?

std::vector<std::string> BuildLargeVector();
...
std::vector<std::string> v = BuildLargeVector();

The traditional version would look like this:

void BuildLargeVector(std::vector<std::string>& result);
...
std::vector<std::string> v;
BuildLargeVector(v);

In the newer version, the value returned from BuildLargeVector is an rvalue, so v would be constructed using the move constructor of std::vector, assuming (N)RVO doesn't take place.

Even prior to C++0x the first form would often be "efficient" because of (N)RVO. However, (N)RVO is at the discretion of the compiler. Now that we have rvalue references it is guaranteed that no deep copy will take place.

Edit: Question is really not about optimization. Both forms shown have near-identical performance in real-world programs. Whereas, in the past, the first form could have had order-of-magnitude worse performance. As a result the first form was a major code smell in C++ programming for a long time. Not anymore, I hope?

James McNellis
  • 348,265
  • 75
  • 913
  • 977
Nate
  • 18,752
  • 8
  • 48
  • 54

7 Answers7

77

Dave Abrahams has a pretty comprehensive analysis of the speed of passing/returning values.

Short answer, if you need to return a value then return a value. Don't use output references because the compiler does it anyway. Of course there are caveats, so you should read that article.

Mgetz
  • 5,108
  • 2
  • 33
  • 51
Peter Alexander
  • 53,344
  • 14
  • 119
  • 168
  • 1
    Although Dave doesn't mention which compilers he tested AFAICS. – Justicle Jun 28 '10 at 18:00
  • Thanks. That article has almost the exact same example that I posted. – Nate Jun 28 '10 at 18:04
  • 28
    "compiler does it anyway": compiler isn't required to do that == uncertainty == bad idea (need 100% certainty). "comprehensive analysis"There is a huge problem with that analysis - it relies on undocumented/non-standard language features in unknown compiler ("Although copy elision is never required by the standard"). So even if it works, it is not a good idea to use it - there is absolutely no warranty that it will work as intended, and there is no warranty that every compiler will always work this way. Relying on this document is a bad coding practice, IMO. Even if you'll lose performance. – SigTerm Jun 28 '10 at 19:24
  • 6
    @SigTerm: That is an excellent comment!!! most of the referenced article is too vague to even consider for use in production. People think anything an author who's written a Red In-Depth book is gospel and should be adhered to without any further thought or analysis. ATM there isn't a compiler on the market that provides copy-elison as varied as the examples Abrahams uses in the article. – Hippicoder Jun 28 '10 at 20:35
  • 15
    @SigTerm, there's a **lot** that the compiler is not required to do, but you assume it does anyway. Compilers aren't "required" to change `x / 2` to `x >> 1` for `int`s, but you assume it will. The standard also says nothing about how compilers are required to implement references, but you assume that they are handled efficiently using pointers. The standard also says nothing about v-tables, so you can't be sure that virtual function calls are efficient either. Essentially, you need to put some faith in the compiler at times. – Peter Alexander Jun 28 '10 at 23:04
  • 4
    @Peter Alexander: "x / 2 to x >> 1 for `int`s, but you assume it will", "but you assume that they are handled using pointers" actually, I don't assume that. It is *unknown* what compiler will do, and although it is *possible* that compiler will do what you think, it is not *guaranteed*, so it will be safer to assume that compiler will do exactly opposite thing(Murphy's law). "you need to put some faith" I don't. I prefer to "stand on solid ground". Assuming too much in programming is like juggling with live grenades - sometimes it works fine, but when it doesn't, results are catastrophic. – SigTerm Jun 28 '10 at 23:19
  • 5
    @Peter Alexander: "you need to put some faith in the compiler at times". I can't put my faith into all existing compilers at once. And I don't want to be chained to a single development tool or platform. – SigTerm Jun 28 '10 at 23:22
  • 16
    @Sig: Very little is actually guaranteed except the actual output of your program. If you want 100% certainty about what is going to happen 100% of the time, then you're better off switching to a different language outright. – Dennis Zickefoose Jun 28 '10 at 23:54
  • 1
    @Dennis Zickefoose: "Very little is actually guaranteed except the actual output of your program." This is with me. – SigTerm Jun 29 '10 at 05:43
  • 3
    @SigTerm. It is not "unknown" what compilers do. It is merely unspecified by the standard. We know very well what compilers do in many cases, and the optimisations they perform (through empirical tests). If you only go by what the standard requires then I suppose you use `export` in all your programs (because the standard requires that... right?). No, of course not. You learn what your compiler(s) can do and you take that into account when you are programming. It's just common sense. – Peter Alexander Jun 29 '10 at 06:00
  • 1
    @Peter Alexander: "unspecified" == you're not certain == "unknown". I still believe that relying on document you mentioned is a very BAD idea. In my experience, when you think that "compiler will optimize thing in certain way", compiler frequently does not behave as you expected. You can rely on stl, and you can assume that it will work as you expected on any platform/compiler. BUt assuming that compiler will not call a copy constructor when it should is extremely unwise - if you change compiler and there is no RVO/elision, ALL your code will go to hell, and you'll have to rewrite everything. – SigTerm Jun 29 '10 at 06:18
  • 2
    My faith is the amount of trust I have in the compilers I'm using proportional to the amount of additional work it would require if I distrust them. faith = trust in compiler/amount of work to distrust. When that's greater than or equal to 1, I go with the compiler and hope it optimizes things for me. – stinky472 Jun 29 '10 at 06:23
  • @Peter Alexander: I.e. in the end it is very unwise to rely on RVO/copy elision, unless ALL C++ compilers are __required__ to implement them. I prefer to keep my sanity and avoid rewriting everything on the day when "undocumented feature" stops working. If it is undocumented - it may be changed. "If something may go wrong, it will", so you can assume worst-case scenario. This is what I call common sense. – SigTerm Jun 29 '10 at 06:23
  • @Peter Alexander: When I return by value, or pass arguments by value, I expect compiler to call copy constructor and waste CPU. When RVO/elision is in effect, it will perform better than I expected. On contrary, when you rely on RVO/elision, and they are not implemented, program will perform worse than you expected. So, if I assume "worst-case scenario"(there is no RVO), programm may work faster than I though. If you assume best-case scenario(there is RVO), program may work slower than you thought. I may get pleasant surprise, you may get unpleasant one. So, not relying on RVO is safer. – SigTerm Jun 29 '10 at 06:33
  • 6
    @SigTerm: I work on "actual-case scenario". I test what the compiler does and work with that. There is no "may work slower". It simply does not work slower because the compiler DOES implement RVO, whether the standard requires it or not. There are no ifs, buts, or maybes, it's just simple fact. – Peter Alexander Jun 29 '10 at 06:51
  • 2
    @Peter Alexander: If you want to use undocumented language features, you can do that but ONLY if they are documented compiler features. I.e. there is a documentation provided by compiler developer (not by a blog in the middle of nowhere) that states when and how compiler uses RVO/elision. You should expect than this feature will be removed in the next release or that your company will decide to switch to compiler without that feature. "compiler DOES implement RVO" Wrong. "*some* compilers implement RVO", and there is no warranty that all of them behave in this way. – SigTerm Jun 29 '10 at 07:24
  • 2
    @Peter Alexander: I trust (more or less) to official documentation only. This is what I meant when I said about "standing on solid ground". I.e. C++ standard or information provided by compiler developer (microsoft, GCC devs, etc). YOur "analysis" cites neither of them. Which means it is a bad advice relying on something that "works" for unknown reason and bound to stop working in the future. Even tests are not always a good solution, because you're prone to human errors - you can miss a situation when undocumented feature will misbehave. I believe this is the end of discussion. – SigTerm Jun 29 '10 at 07:30
  • 4
    It is documented by the compilers. Go have a look for yourself. – Peter Alexander Jun 29 '10 at 08:10
  • Whilst this may theoretically answer the question, [it would be preferable](http://meta.stackoverflow.com/q/8259) to include the essential parts of the answer here, and provide the link for reference. Also the link is very dead... – Mgetz Nov 12 '14 at 17:11
  • The link is also old; compilers have probably improved since 2009. – Andrew Wagner Oct 06 '15 at 09:01
  • Responding to @SigTerm's commend from ten years ago "Although copy elision is never required by the standard" is no longer the case, there are [cases in c++17](https://en.cppreference.com/w/cpp/language/copy_elision#Mandatory_elision_of_copy.2Fmove_operations) where it is required. Additionally, move semantics have made returning vectors very cheap, as other answers detail. – Ryan Haining Dec 26 '19 at 23:37
37

At least IMO, it's usually a poor idea, but not for efficiency reasons. It's a poor idea because the function in question should usually be written as a generic algorithm that produces its output via an iterator. Almost any code that accepts or returns a container instead of operating on iterators should be considered suspect.

Don't get me wrong: there are times it makes sense to pass around collection-like objects (e.g., strings) but for the example cited, I'd consider passing or returning the vector a poor idea.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • 7
    The problem with the iterator approach is it requires you to make functions and methods templated, even when the collection element type is known. This is irritating, and when the method in question is virtual, impossible. Note, I'm not disagreeing with your answer per se, but in practice it just becomes a bit cumbersome in C++. – jon hanson Jun 28 '10 at 22:37
  • 25
    I have to disagree. Using iterators for output is sometimes appropriate, but if you aren't writting a generic algorithm, generic solutions often provide unavoidable overhead that is hard to justify. Both in terms of code complexity and actual performance. – Dennis Zickefoose Jun 28 '10 at 23:13
  • 1
    @Dennis: I have to say my experience has been quite the opposite: I write a fair number of things as templates even when I know the types involved ahead of time, because doing so is simpler and improves performance. – Jerry Coffin Jun 29 '10 at 02:00
  • 9
    I personally return a container. The intent is clear, the code is easier, I don't care much for the performance when I write it (I just avoid early pessimization). I am unsure whether using an output iterator would make my intent clearer... and I need non-template code as much as possible, because in a large project dependencies kill development. – Matthieu M. Jun 29 '10 at 06:32
  • Doing so *might* be simpler, and it *might* improve performance. I posit that, in the case that you are conceptually building a container rather than writing to a range, there is no way changing the interface to operate on iterators instead of vectors will be both. Sure, the implementation of the routine will probably be simplified, but you do so by putting the onus on the caller to properly construct the container in advance. They do it right, and the code is less simple. They do it naively, and the code is less efficient. They do it wrong, and the code is less correct. – Dennis Zickefoose Jun 29 '10 at 08:26
  • 1
    @Dennis: I will posit that conceptually, you should *never* be "building a container rather than writing to a range." A container is just that -- a container. Your concern (and your code's concern) should be with the contents, not the container. – Jerry Coffin Jun 29 '10 at 15:34
  • One middle path is returning an iterator that "holds" the container: http://www.boost.org/doc/libs/1_52_0/libs/utility/shared_container_iterator.html – amit kumar Dec 05 '12 at 17:48
  • I think this is classic C++ overengineering mentality. But I can see how you would think that if you are doing distributed systems – jgleoj23 Jul 28 '21 at 07:09
19

The gist is:

Copy Elision and RVO can avoid the "scary copies" (the compiler is not required to implement these optimizations, and in some situations it can't be applied)

C++ 0x RValue references allow a string/vector implementations that guarantees that.

If you can abandon older compilers / STL implementations, return vectors freely (and make sure your own objects support it, too). If your code base needs to support "lesser" compilers, stick to the old style.

Unfortunately, that has major influence on your interfaces. If C++ 0x is not an option, and you need guarantees, you might use instead reference-counted or copy-on-write objects in some scenarios. They have downsides with multithreading, though.

(I wish just one answer in C++ would be simple and straightforward and without conditions).

peterchen
  • 40,917
  • 20
  • 104
  • 186
18

Indeed, since C++11, the cost of copying the std::vector is gone in most cases.

However, one should keep in mind that the cost of constructing the new vector (then destructing it) still exists, and using output parameters instead of returning by value is still useful when you desire to reuse the vector's capacity. This is documented as an exception in F.20 of the C++ Core Guidelines.

Let's compare:

std::vector<int> BuildLargeVector1(size_t vecSize) {
    return std::vector<int>(vecSize, 1);
}

with:

void BuildLargeVector2(/*out*/ std::vector<int>& v, size_t vecSize) {
    v.assign(vecSize, 1);
}

Now, suppose we need to call these methods numIter times in a tight loop, and perform some action. For example, let's compute the sum of all elements.

Using BuildLargeVector1, you would do:

size_t sum1 = 0;
for (int i = 0; i < numIter; ++i) {
    std::vector<int> v = BuildLargeVector1(vecSize);
    sum1 = std::accumulate(v.begin(), v.end(), sum1);
}

Using BuildLargeVector2, you would do:

size_t sum2 = 0;
std::vector<int> v;
for (int i = 0; i < numIter; ++i) {
    BuildLargeVector2(/*out*/ v, vecSize);
    sum2 = std::accumulate(v.begin(), v.end(), sum2);
}

In the first example, there are many unnecessary dynamic allocations/deallocations happening, which are prevented in the second example by using an output parameter the old way, reusing already allocated memory. Whether or not this optimization is worth doing depends on the relative cost of the allocation/deallocation compared to the cost of computing/mutating the values.

Benchmark

Let's play with the values of vecSize and numIter. We will keep vecSize*numIter constant so that "in theory", it should take the same time (= there is the same number of assignments and additions, with the exact same values), and the time difference can only come from the cost of allocations, deallocations, and better use of cache.

More specifically, let's use vecSize*numIter = 2^31 = 2147483648, because I have 16GB of RAM and this number ensures that no more than 8GB is allocated (sizeof(int) = 4), ensuring that I am not swapping to disk (all other programs were closed, I had ~15GB available when running the test).

Here is the code:

#include <chrono>
#include <iomanip>
#include <iostream>
#include <numeric>
#include <vector>

class Timer {
    using clock = std::chrono::steady_clock;
    using seconds = std::chrono::duration<double>;
    clock::time_point t_;

public:
    void tic() { t_ = clock::now(); }
    double toc() const { return seconds(clock::now() - t_).count(); }
};

std::vector<int> BuildLargeVector1(size_t vecSize) {
    return std::vector<int>(vecSize, 1);
}

void BuildLargeVector2(/*out*/ std::vector<int>& v, size_t vecSize) {
    v.assign(vecSize, 1);
}

int main() {
    Timer t;

    size_t vecSize = size_t(1) << 31;
    size_t numIter = 1;

    std::cout << std::setw(10) << "vecSize" << ", "
              << std::setw(10) << "numIter" << ", "
              << std::setw(10) << "time1" << ", "
              << std::setw(10) << "time2" << ", "
              << std::setw(10) << "sum1" << ", "
              << std::setw(10) << "sum2" << "\n";

    while (vecSize > 0) {

        t.tic();
        size_t sum1 = 0;
        {
            for (int i = 0; i < numIter; ++i) {
                std::vector<int> v = BuildLargeVector1(vecSize);
                sum1 = std::accumulate(v.begin(), v.end(), sum1);
            }
        }
        double time1 = t.toc();

        t.tic();
        size_t sum2 = 0;
        {
            std::vector<int> v;
            for (int i = 0; i < numIter; ++i) {
                BuildLargeVector2(/*out*/ v, vecSize);
                sum2 = std::accumulate(v.begin(), v.end(), sum2);
            }
        } // deallocate v
        double time2 = t.toc();

        std::cout << std::setw(10) << vecSize << ", "
                  << std::setw(10) << numIter << ", "
                  << std::setw(10) << std::fixed << time1 << ", "
                  << std::setw(10) << std::fixed << time2 << ", "
                  << std::setw(10) << sum1 << ", "
                  << std::setw(10) << sum2 << "\n";

        vecSize /= 2;
        numIter *= 2;
    }

    return 0;
}

And here is the result:

$ g++ -std=c++11 -O3 main.cpp && ./a.out
   vecSize,    numIter,      time1,      time2,       sum1,       sum2
2147483648,          1,   2.360384,   2.356355, 2147483648, 2147483648
1073741824,          2,   2.365807,   1.732609, 2147483648, 2147483648
 536870912,          4,   2.373231,   1.420104, 2147483648, 2147483648
 268435456,          8,   2.383480,   1.261789, 2147483648, 2147483648
 134217728,         16,   2.395904,   1.179340, 2147483648, 2147483648
  67108864,         32,   2.408513,   1.131662, 2147483648, 2147483648
  33554432,         64,   2.416114,   1.097719, 2147483648, 2147483648
  16777216,        128,   2.431061,   1.060238, 2147483648, 2147483648
   8388608,        256,   2.448200,   0.998743, 2147483648, 2147483648
   4194304,        512,   0.884540,   0.875196, 2147483648, 2147483648
   2097152,       1024,   0.712911,   0.716124, 2147483648, 2147483648
   1048576,       2048,   0.552157,   0.603028, 2147483648, 2147483648
    524288,       4096,   0.549749,   0.602881, 2147483648, 2147483648
    262144,       8192,   0.547767,   0.604248, 2147483648, 2147483648
    131072,      16384,   0.537548,   0.603802, 2147483648, 2147483648
     65536,      32768,   0.524037,   0.600768, 2147483648, 2147483648
     32768,      65536,   0.526727,   0.598521, 2147483648, 2147483648
     16384,     131072,   0.515227,   0.599254, 2147483648, 2147483648
      8192,     262144,   0.540541,   0.600642, 2147483648, 2147483648
      4096,     524288,   0.495638,   0.603396, 2147483648, 2147483648
      2048,    1048576,   0.512905,   0.609594, 2147483648, 2147483648
      1024,    2097152,   0.548257,   0.622393, 2147483648, 2147483648
       512,    4194304,   0.616906,   0.647442, 2147483648, 2147483648
       256,    8388608,   0.571628,   0.629563, 2147483648, 2147483648
       128,   16777216,   0.846666,   0.657051, 2147483648, 2147483648
        64,   33554432,   0.853286,   0.724897, 2147483648, 2147483648
        32,   67108864,   1.232520,   0.851337, 2147483648, 2147483648
        16,  134217728,   1.982755,   1.079628, 2147483648, 2147483648
         8,  268435456,   3.483588,   1.673199, 2147483648, 2147483648
         4,  536870912,   5.724022,   2.150334, 2147483648, 2147483648
         2, 1073741824,  10.285453,   3.583777, 2147483648, 2147483648
         1, 2147483648,  20.552860,   6.214054, 2147483648, 2147483648

Benchmark results

(Intel i7-7700K @ 4.20GHz; 16GB DDR4 2400Mhz; Kubuntu 18.04)

Notation: mem(v) = v.size() * sizeof(int) = v.size() * 4 on my platform.

Not surprisingly, when numIter = 1 (i.e., mem(v) = 8GB), the times are perfectly identical. Indeed, in both cases we are only allocating once a huge vector of 8GB in memory. This also proves that no copy happened when using BuildLargeVector1(): I wouldn't have enough RAM to do the copy!

When numIter = 2, reusing the vector capacity instead of re-allocating a second vector is 1.37x faster.

When numIter = 256, reusing the vector capacity (instead of allocating/deallocating a vector over and over again 256 times...) is 2.45x faster :)

We can notice that time1 is pretty much constant from numIter = 1 to numIter = 256, which means that allocating one huge vector of 8GB is pretty much as costly as allocating 256 vectors of 32MB. However, allocating one huge vector of 8GB is definitly more expensive than allocating one vector of 32MB, so reusing the vector's capacity provides performance gains.

From numIter = 512 (mem(v) = 16MB) to numIter = 8M (mem(v) = 1kB) is the sweet spot: both methods are exactly as fast, and faster than all other combinations of numIter and vecSize. This probably has to do with the fact that the L3 cache size of my processor is 8MB, so that the vector pretty much fits completely in cache. I don't really explain why the sudden jump of time1 is for mem(v) = 16MB, it would seem more logical to happen just after, when mem(v) = 8MB. Note that surprisingly, in this sweet spot, not re-using capacity is in fact slightly faster! I don't really explain this.

When numIter > 8M things start to get ugly. Both methods get slower but returning the vector by value gets even slower. In the worst case, with a vector containing only one single int, reusing capacity instead of returning by value is 3.3x faster. Presumably, this is due to the fixed costs of malloc() which start to dominate.

Note how the curve for time2 is smoother than the curve for time1: not only re-using vector capacity is generally faster, but perhaps more importantly, it is more predictable.

Also note that in the sweet spot, we were able to perform 2 billion additions of 64bit integers in ~0.5s, which is quite optimal on a 4.2Ghz 64bit processor. We could do better by parallelizing the computation in order to use all 8 cores (the test above only uses one core at a time, which I have verified by re-running the test while monitoring CPU usage). The best performance is achieved when mem(v) = 16kB, which is the order of magnitude of L1 cache (L1 data cache for the i7-7700K is 4x32kB).

Of course, the differences become less and less relevant the more computation you actually have to do on the data. Below are the results if we replace sum = std::accumulate(v.begin(), v.end(), sum); by for (int k : v) sum += std::sqrt(2.0*k);:

Benchmark 2

Conclusions

  1. Using output parameters instead of returning by value may provide performance gains by re-using capacity.
  2. On a modern desktop computer, this seems only applicable to large vectors (>16MB) and small vectors (<1kB).
  3. Avoid allocating millions/billions of small vectors (< 1kB). If possible, re-use capacity, or better yet, design your architecture differently.

Results may differ on other platforms. As usual, if performance matters, write benchmarks for your specific use case.

Boris Dalstein
  • 7,015
  • 4
  • 30
  • 59
5

I still think it is a bad practice but it's worth noting that my team uses MSVC 2008 and GCC 4.1, so we're not using the latest compilers.

Previously a lot of the hotspots shown in vtune with MSVC 2008 came down to string copying. We had code like this:

String Something::id() const
{
    return valid() ? m_id: "";
}

... note that we used our own String type (this was required because we're providing a software development kit where plugin writers could be using different compilers and therefore different, incompatible implementations of std::string/std::wstring).

I made a simple change in response to the call graph sampling profiling session showing String::String(const String&) to be taking up a significant amount of time. Methods like in the above example were the greatest contributors (actually the profiling session showed memory allocation and deallocation to be one of the biggest hotspots, with the String copy constructor being the primary contributor for the allocations).

The change I made was simple:

static String null_string;
const String& Something::id() const
{
    return valid() ? m_id: null_string;
}

Yet this made a world of difference! The hotspot went away in subsequent profiler sessions, and in addition to this we do a lot of thorough unit testing to keep track of our application performance. All kinds of performance test times dropped significantly after these simple changes.

Conclusion: we're not using the absolute latest compilers, but we still can't seem to depend on the compiler optimizing away the copying for returning by value reliably (at least not in all cases). That may not be the case for those using newer compilers like MSVC 2010. I'm looking forward to when we can use C++0x and simply use rvalue references and not ever have to worry that we're pessimizing our code by returning complex classes by value.

[Edit] As Nate pointed out, RVO applies to returning temporaries created inside of a function. In my case, there were no such temporaries (except for the invalid branch where we construct an empty string) and thus RVO would not have been applicable.

stinky472
  • 6,737
  • 28
  • 27
  • 3
    That's the thing: RVO is compiler-dependent, but a C++0x compiler *must* use move semantics if it decides not to use RVO (assuming there's a move constructor). Using the trigraph operator defeats RVO. See http://cpp-next.com/archive/2009/09/move-it-with-rvalue-references/ which Peter referred to. But your example is not eligible for move semantics anyway because you're not returning a temporary. – Nate Jun 28 '10 at 18:43
  • @Stinky472: Returning a member by value was always going to be slower than reference. Rvalue references would still be slower than returning a reference to the original member (if the caller can take a reference instead of needing a copy). In addition, there are still many times that you can save, over rvalue references, because you have context. For example, you can do String newstring; newstring.resize(string1.size() + string2.size() + ...); newstring += string1; newstring += string2; etc. This is still a substantial saving over rvalues. – Puppy Jun 28 '10 at 19:00
  • @DeadMG a substantial saving over binary operator+ even with C++0x compilers implementing RVO? If so, that's a shame. Then again that makse sense since we still end up having to create a temporary to compute the concatenated string whereas += can concatenate directly to newstring. – stinky472 Jun 28 '10 at 19:12
  • How about a case like: string newstr = str1 + str2; On a compiler implementing move semantics, it seems like that should be as fast as or even faster than: string newstr; newstr += str1; newstr += str2; No reserve, so to speak (I'm assuming you meant reserve instead of resize). – stinky472 Jun 28 '10 at 19:17
  • 5
    @Nate: I think you are confusing *trigraphs* like `<::` or `??!` with the *conditional operator* `?:` (sometimes called the *ternary operator*). – fredoverflow Jun 28 '10 at 19:20
  • @stinky472: For just two strings, it's no different. For more, it's substantial, especially if you want a system that performs acceptably on C++03 compilers. In my opinion, compilers should have the freedom to do whatever they want with rvalues. If the Standard would slacken the rules, we might be able to do even better. – Puppy Jun 28 '10 at 19:59
3

Just to nitpick a little: it is not common in many programming languages to return arrays from functions. In most of them, a reference to array is returned. In C++, the closest analogy would be returning boost::shared_array

Nemanja Trifunovic
  • 24,346
  • 3
  • 50
  • 88
  • True. Behind the scenes it's dealing with references, but in a language like Python or PHP or some variants of BASIC that detail is hidden from you. – Nate Jun 28 '10 at 18:13
  • 1
    A `std::vector` is like a reference to an array -- the pointer to the actual array block is stored the vector object. That pointer is going to be move constructed (C++0x)/NRVO'd (C++03) for the calling function, and therefore it's no different than passing a reference. – Billy ONeal Jun 28 '10 at 18:14
  • 4
    @Billy: std::vector is a value type with copy semantics. The current C++ standard offers no guarantees that (N)RVO ever gets applied, and in practice there are many real-life scenarios when it is not. – Nemanja Trifunovic Jun 28 '10 at 18:20
  • @Nemanja: Yes, the standard does not require it, but most compilers in modern use are going to do it. – Billy ONeal Jun 28 '10 at 18:21
  • 3
    @Billy: Again, there are some very real scenarios where even the latest compilers don't apply NRVO: http://www.efnetcpp.org/wiki/Return_value_optimization#Named_RVO – Nemanja Trifunovic Jun 28 '10 at 18:36
  • 1
    @Nemanja: That does not change the fact that in 99% of cases, you can treat the returned vector as if you returned a reference type. If you profile and see a lot of copies from returns then you can think about changing that sort of thing. – Billy ONeal Jun 28 '10 at 21:20
  • 3
    @Billy ONeal: 99% is not enough, you need 100%. Murphy's law - "if something can go wrong, it will". Uncertainty is fine if you're dealing with some kind of fuzzy logic, but it is not a good idea for writing traditional software. If there is even 1% of possibility that code does not work the way you think, then you should expect this code will introduce critical bug that will get you fired. Plus it is not a standard feature. Using undocumented features is a bad idea - if in one year from know compiler will drop feature (it isn't _required_ by standard, right?), you'll be the one in trouble. – SigTerm Jun 28 '10 at 23:02
  • 4
    @SigTerm: If we were talking about correctness of behavior, I would agree with you. However, we are talking about a performance optimization. Such things are fine with less than 100% certainty. – Billy ONeal Jun 28 '10 at 23:03
  • @Billy: How did you come up with the 99% number? Anyway, even if I profile it with one compiler it does not mean another compiler will perform the same optimizations. In a nutshell, I would say that relying on RVO is safe enough these days, but not NRVO. – Nemanja Trifunovic Jun 29 '10 at 12:45
  • 2
    @Nemanja: I don't see what's being "relied upon" here. Your app runs the same no matter if RVO or NRVO is used. If they're used though, it will run faster. If your app is too slow on a particular platform and you traced it back to return value copying, then by all means change it, but that does not change the fact that the best practice is still to use the return value. If you absolutely need to ensure no copying occurs wrap the vector in a `shared_ptr` and call it a day. – Billy ONeal Jun 29 '10 at 13:26
  • @Billy: The app may or may not "run the same" - if the vector is big enough it may lead to unacceptable performance or even a crash. Of course, there are cases when you are certain the vector content is small enough not to cause any such problems, but in general I wouldn't just copy vectors blindly and hope that complier(s) perform copy elision for me. – Nemanja Trifunovic Jun 29 '10 at 13:41
2

If performance is a real issue you should realise that move semantics aren't always faster than copying. For example if you have a string that uses the small string optimization then for small strings a move constructor must do the exact same amount of work as a regular copy constructor.

Motti
  • 110,860
  • 49
  • 189
  • 262