Let me micro-optimize your second version of f()
and call it g()
:
#include <cstdio>
#include <string>
using namespace std;
string f(const string& s) {
return s + "some text";
}
void g(const string& s, string &result) {
result.clear();
result += s;
result += "some text";
}
Now, let's compare the return by value approach f()
to the "out-parameter" approach g()
.
Return by value:
int main(int argc, char* argv[]) {
string s(argv[1]);
for (int i=0; i<10; ++i) {
string temp = f(s); // at least 1 memory allocation in each iteration, ouch!
fprintf(stderr, "%s\n", temp.c_str());
}
}
In each iteration, there is a memory allocation. The total number of allocations will be the number of iterations + 1, that is, 11 in this case.
The "out-parameter" approach:
int main(int argc, char* argv[]) {
string s(argv[1]);
string temp; // note that this time, it is outside the loop
for (int i=0; i<10; ++i) {
g(s, temp);
fprintf(stderr, "%s\n", temp.c_str());
}
}
In this case, you get 3 memory allocations (assuming the buffer of temp
doesn't need to be re-allocated inside the loop), even if you iterate 1000000 times! That is a significant improvement over the return by value approach.
Returning by value and relying on copy-elision or on move semantics is a good advice, but as the example shows, there are situations in which the out-parameter approach wins (e.g. when you can re-use a buffer).
The danger with out-parameters is that at the call site, it must be obvious, just by looking at the code, that the function is modifying some of its arguments. The name of the function must strongly suggest that it is mutating some of its arguments. Otherwise you get surprising results... :(
If you find this example too twisted, well, it isn't: Think of std::getline()
!
And for those who think it is premature optimization: In case of std::getline()
it certainly isn't! If you shove the lines of a file into a std::vector
and allocate a new string for each line it will be 1.6x slower than the out-paramter approach (with lines of 80 bytes). It sounds crazy as the file IO should be the bottleneck but it isn't, it is the unnecessary memory allocations. For details, see Andrei Alexandrescu: Writing Quick Code in C++, Quickly at around 48 min.
UPDATE:
R. Martinho Fernandes kindly pointed out below in comments that his measurements with
gcc contradict my results but are in agreement with my claims with
clang and libc++; see
GCC
and
Clang.
After he pointed out these, I made measurements on Andrei
Alexandrescu's example. At the moment, I cannot reproduce his
results; it needs further analysis as to understand what is happening under the
hood.
Please be patient and give me some time to clear up the inconsistencies.
The take-away of this story is to always measure. I did measure the number of memory allocations mentioned in the answer, that is still OK (at least on my machine).