1

For returning a string out of a function, which of these two is more efficient (i.e. which one should I be using):

std::string f(const std::string& s)
{
    return s + "some text";
}

or

void f(const std::string& s, std::string &result)
{
    result = s + "some text";
}

I understand that maybe the answer depends on a particular compiler. But I want to know what the recommended approach (if there is one) is in a modern C++ code.

Based on "Lightness Races in Orbit" comment below, here are some related questions that I found on stackoverflow before I asked this question:

Are the days of passing const std::string & as a parameter over?

Passing std::string by Value or Reference

Pass by value or const reference?

"std::string" or "const std::string&" argument? (the argument is internally copied and modified)

None of which answer my particular question regarding returning a value from a function versus returning the string as an extra argument.

Community
  • 1
  • 1
Ondřej Čertík
  • 780
  • 8
  • 18
  • 2
    C++11 return by value, cuz moved if not elided – IdeaHat Feb 07 '14 at 19:36
  • @LightnessRacesinOrbit, thank you for your feedback. I have updated the question with references to other answers on this site. – Ondřej Čertík Feb 07 '14 at 19:47
  • Um yes they do answer your question. – Lightness Races in Orbit Feb 07 '14 at 19:48
  • @LightnessRacesinOrbit --- is returning `std::string` from a function equivalent to returning `std::string` by *value* as an argument? If so, then they indeed do answer my question. But I though it is not equivalent. – Ondřej Čertík Feb 07 '14 at 19:52
  • 3
    What does `returning std::string by value as an argument` mean? – Lightness Races in Orbit Feb 07 '14 at 19:55
  • @MadScienceDreams Well, the picture is not so black and white, see my answer. – Ali Feb 07 '14 at 20:44
  • 1
    @LightnessRacesinOrbit Yes, the question was worded somewhat unfortunate but it is an interesting question! Please read my answer. – Ali Feb 07 '14 at 20:45
  • 1
    @Ali Your point is valid, but you should add at least a nod to "Premature optimization is the root of all evil". – IdeaHat Feb 07 '14 at 20:46
  • @LightnessRacesinOrbit, by `returning std::string by value as an argument` I mean `void f(std::string result)`, where the function returns the new string in `result`. In other words, assuming these two are the same, then my question becomes "is it better to use `void f(std::string result)` or `void f(std::string &result)` for return arguments" and that has indeed been answered. – Ondřej Čertík Feb 07 '14 at 20:59
  • 1
    @MadScienceDreams Not quite, please check my updated answer! – Ali Feb 07 '14 at 21:00
  • @LightnessRacesinOrbit, I just realized thanks to Ali and Michael below, that it is of course not equivalent, since you can't return the string by `f(std::string result)`. So I think my question hasn't yet been answered on stack overflow. But Ali has provided a great answer below. – Ondřej Čertík Feb 07 '14 at 21:16
  • @OndřejČertík: You can't "return" anything like that. If you want to send a value back to the calling scope, that's fine, but taking a copy ain't gonna help you there... – Lightness Races in Orbit Feb 08 '14 at 01:43

3 Answers3

6

Let me micro-optimize your second version of f() and call it g():

#include <cstdio>
#include <string>
using namespace std;

string f(const string& s) {
    return s + "some text";
}

void g(const string& s, string &result) {
    result.clear();
    result += s;
    result += "some text";
}

Now, let's compare the return by value approach f() to the "out-parameter" approach g().

Return by value:

int main(int argc, char* argv[]) {

    string s(argv[1]);

    for (int i=0; i<10; ++i) {

      string temp = f(s); // at least 1 memory allocation in each iteration, ouch!

      fprintf(stderr, "%s\n", temp.c_str());
    }
}

In each iteration, there is a memory allocation. The total number of allocations will be the number of iterations + 1, that is, 11 in this case.

The "out-parameter" approach:

int main(int argc, char* argv[]) {

    string s(argv[1]);

    string temp; // note that this time, it is outside the loop

    for (int i=0; i<10; ++i) {

      g(s, temp);

      fprintf(stderr, "%s\n", temp.c_str());
    }
}

In this case, you get 3 memory allocations (assuming the buffer of temp doesn't need to be re-allocated inside the loop), even if you iterate 1000000 times! That is a significant improvement over the return by value approach.

Returning by value and relying on copy-elision or on move semantics is a good advice, but as the example shows, there are situations in which the out-parameter approach wins (e.g. when you can re-use a buffer).

The danger with out-parameters is that at the call site, it must be obvious, just by looking at the code, that the function is modifying some of its arguments. The name of the function must strongly suggest that it is mutating some of its arguments. Otherwise you get surprising results... :(

If you find this example too twisted, well, it isn't: Think of std::getline()!

And for those who think it is premature optimization: In case of std::getline() it certainly isn't! If you shove the lines of a file into a std::vector and allocate a new string for each line it will be 1.6x slower than the out-paramter approach (with lines of 80 bytes). It sounds crazy as the file IO should be the bottleneck but it isn't, it is the unnecessary memory allocations. For details, see Andrei Alexandrescu: Writing Quick Code in C++, Quickly at around 48 min.


UPDATE:

  1. R. Martinho Fernandes kindly pointed out below in comments that his measurements with gcc contradict my results but are in agreement with my claims with clang and libc++; see GCC and Clang.

  2. After he pointed out these, I made measurements on Andrei Alexandrescu's example. At the moment, I cannot reproduce his results; it needs further analysis as to understand what is happening under the hood.

Please be patient and give me some time to clear up the inconsistencies.

The take-away of this story is to always measure. I did measure the number of memory allocations mentioned in the answer, that is still OK (at least on my machine).

Community
  • 1
  • 1
Ali
  • 56,466
  • 29
  • 168
  • 265
  • Ali, thank you for this awesome answer. It answers my question, but I have one complementary question to you --- would your conclusion about the "Return by value" change if the function was instead defined as `void f(const std::string& s, std::string result)`, so it still returns the result by value, but this time as an argument, not the function result. The reason I am asking this is that then we connect our answers with all the other questions answered here on stackexchange. – Ondřej Čertík Feb 07 '14 at 21:03
  • 1
    @Ondřej Čertík: If your function was defined the way you suggest, assignments to `result` would only affect the local copy passed into that function by value. Those changes would be lost as soon as the function returns, so you are not returning anything in that case. – Michael Karcher Feb 07 '14 at 21:05
  • 1
    No, my answer won't change. The `void f(const std::string& s, std::string result)` doesn't return anything. The caller will see an unchanged result because `f()` worked on a *copy* in this case, and after `f()` returned, the caller sees his original `result`. You must pass be reference (`string& result`) or you don't see what `f()` has done to `result`. – Ali Feb 07 '14 at 21:06
  • Ali and Michael, that's of course right! I think this fully answers my question. Thanks again for your time, I really appreciate it. – Ondřej Čertík Feb 07 '14 at 21:11
  • 1
    RE: get_lines, ranges ftw: http://ericniebler.com/2013/11/07/input-iterators-vs-input-ranges/ – TemplateRex Feb 07 '14 at 22:18
  • @TemplateRex Thanks! I will read it during the week-end; I will need some time to digest it. – Ali Feb 07 '14 at 22:38
  • 1
    My measurements seem to contradict your assertions about the performance benefits of out-parameters (https://dl.dropboxusercontent.com/u/13779444/bench/by-val-by-ref-2.html) – R. Martinho Fernandes Feb 19 '14 at 10:31
  • 1
    @R.MartinhoFernandes Could you expand on this a little bit, please? Where are the source codes of your experiments? Hard disk, SSD or memory mapped files? On what machine / operation system did you run your codes? Which compiler and optimization flags? What was the input? I could go on but long story short, I need more details. Note: I just referred to Alexandrescu's experiments, I did *not* perform measurements myself on this particular example. – Ali Feb 19 '14 at 11:11
  • The code is linked. Regardless, I find it mildly funny that you refer to Alexandrescu's talk and *did not* measure. I thought he had been clear enough about that part in the talk, but maybe not. – R. Martinho Fernandes Feb 19 '14 at 11:13
  • @R.MartinhoFernandes I did not perform measurements myself **on this particular example.** I did get bitten on Windows with one application: It was 5-20x slower(!) on Windows than on Linux, and after profiling, it turned out that the string allocation and reallocation makes it miserably slow. What Alexandrescu says is in agreement with what I saw and did not have any reason to question his findings (at least so far). Maybe the tide is changing. OK, please give me some time; I do measurements myself. I still need details: OS, compiler, optimization flags. – Ali Feb 19 '14 at 11:23
  • 1
    The tide is always changing. That's why Andrei says that you should always **measure**. I ran that on Linux with GCC 4.8, -O3 -flto. FWIW, results for the same code with clang 3.4 and libc++: https://dl.dropboxusercontent.com/u/13779444/bench/by-val-by-ref-3.html. You don't need to measure; all you need to do is remove the sense of absolute certainty that pervades this post because it's all maybes and as Alexandrescu says you should *always* measure. – R. Martinho Fernandes Feb 19 '14 at 11:32
  • @R.MartinhoFernandes "You don't need to measure; [...] as Alexandrescu says you should always measure." That seems to be a little self-contradictory. :) Anyway, I am already halfway. As I said earlier, I got bitten on Windows in a similar case. – Ali Feb 19 '14 at 11:40
  • 1
    What I meant is that you don't need to measure for the purpose of writing an answer, because you don't know if the results will match what the OP (or anyone else looking at the answer) would get; but in order to make a decision about performance between the two options, one must measure. – R. Martinho Fernandes Feb 19 '14 at 11:43
  • @R.MartinhoFernandes Please read my update. Do you agree with the answer up until the part that is now stuck out? If not, what changes do you suggest, what problems do you see still? As for Andrei's example, I cannot reproduce that; strange things are going on. Unfortunately, I have other things to do at the moment; I can only get back to this issue later but I definitely will. – Ali Feb 19 '14 at 13:08
  • Hi, sorry to butt in, but note that returning by value does not preclude reusing a buffer. With `string h( string const & s, string && result_hint = {} ) { … return result_hint; }`, you can refactor the best-default-practice return by value code to reap the benefits of the other style without redoing everything. Adding an ignored hint to the other alternative, the interfaces would be mutually compatible, and you could select the better performing one for each platform using preprocessor `#if`. – Potatoswatter Feb 19 '14 at 13:12
  • 1
    @R.MartinhoFernandes One more question: how many allocations do you get *on your machine* for the codes I posted? Is that different from mine? Because *I did measure* that before posting the answer and that huge difference in the number of memory allocations should matter. – Ali Feb 19 '14 at 13:12
  • @Potatoswatter Thanks for the note. As I say now in the update, there are strange things going on under the hood and I will need some time to understand why. Unfortunately, at the moment I am busy doing something else... – Ali Feb 19 '14 at 13:15
  • @Sorry, I've edited my comment a few times… please reload. I think I have a solution to the whole dilemma, so you don't need to go measure on every platform :) – Potatoswatter Feb 19 '14 at 13:16
  • @Potatoswatter Thanks. Unfortunately, when I tried to reproduce Andrei's example, strange things started to happen. I do not understand the behavior of the code. I suspect some of the inline assembly macros in the string header are interfering. I really need some time to analyze the situation, it is far from trivial... :( – Ali Feb 19 '14 at 13:22
  • @Ali I wrote up my idea, see below. No need to bog down in assembly and nontrivial trivialities, you can have it both ways and switch as testing dictates. Measurement isn't really supposed to be done on artificial benchmarks anyway. :) – Potatoswatter Feb 19 '14 at 13:31
  • @R.MartinhoFernandes I cannot reproduce your results. It is either because your micro-benchmarks are too simple and the compiler noticed that you were doing the same thing over and over again and did something overly clever or there is some fundamental difference between gcc 4.7.2 and gcc 4.8. – Ali Feb 19 '14 at 17:25
  • 1
    @R.MartinhoFernandes So please run [a more realistic benchmark](https://gist.github.com/baharev/cd41bf3878f663792146). It is hopefully sufficiently difficult that the compiler cannot know at compile time that we are doing the same thing over and over again. I have run this on my machine with both input (data.txt) and output (temp.txt) files memory mapped, with 10^7 lines (620MB). The out parameter approach is consistently faster than the by value approach (1810ms vs. 2340ms) and the noise in the timings is less than 60ms (5 runs). – Ali Feb 19 '14 at 17:25
  • @R.MartinhoFernandes As I claim in the answer, this difference is because of the unnecessary memory allocations. I triple-checked: in the pass by reference approach we reuse the buffer but in the pass by value approach a new string is allocated every time. I don't see any other reason that would explain the significant difference in timings. – Ali Feb 19 '14 at 17:26
  • @R.MartinhoFernandes As for the Alexandrescu's example, it is still crossed out in the answer. I cannot reproduce his results, very very strange things are happening. So please give me some time regarding that one. In the meantime, I would like to know what happens on your machine with my benchmarks. I am very curious how many allocations you get per iteration. – Ali Feb 19 '14 at 17:29
  • 1
    FWIW I have another machine with GCC 4.8.2 (as opposed to 4.8.1) where the results agree with clang's. I checked the generated assembly to make sure compiler is not optimising our loop away. It isn't. (The code is written to prevent that with `volatile`, but GCC doesn't drop the loop even without `volatile`). The variety of results makes it quite clear to me that 'it depends; check for yourself' is the most useful answer here, but I'll try it out later to appease your curiosity :) – R. Martinho Fernandes Feb 19 '14 at 18:26
  • @R.MartinhoFernandes Thanks! *"compiler is not optimising our loop away. It isn't. (The code is written to prevent that with `volatile`"* Yes, I saw that. I am still not sure the compiler isn't doing something deviously clever. It doesn't make any sense: Those extra allocations must matter and must make a difference. Whether it matters that much as Andrei claims, I am not sure anymore. In any case, on that machine where the by value version was faster, please run my benchmark. If pass by value is still faster, please check the number of memory allocations per iteration. I am going crazy. :"( – Ali Feb 19 '14 at 22:09
4

The first alternative, return s + "some text";, is simpler. Its behavior in terms of memory allocation is also simple: first s + "some text is evaluated, presumably causing allocation of a new string object with sufficient capacity to hold the result. That object is the return value, assuming copy elision, otherwise a move occurs.

The second interface, as Ali notes, gives the user an opportunity to reuse a string buffer over several calls. Availing of the ability requires a bit more code and incurs a bit more complexity.

Furthermore according to his measurements it's hard to tell which really wins in general. Fortunately, there is a middle path:

#if STRING_BUFFER_REUSE_OPTIMIZATION

string h( string const & s, string && result = {} ) {
    result.clear();
    result += s;
    result += "some text";
    return std::move( result );
}

#else

string const no_hint = {};

string h( string const & s, string const & hint = no_hint ) {
    return s + "some text";
}

#endif

With this, you can set the STRING_BUFFER_REUSE_OPTIMIZATION macro according to measurements du jour on each build target. Both memory access styles are adapted to the same interface with no sacrifices.

Potatoswatter
  • 134,909
  • 25
  • 265
  • 421
  • 1
    +1 from me. Note: the further measurements were done by R. Martinho Fernandes; he pointed out that his timings contradict my claims. I still must understand the weird behavior of the code. Yours is a nice answer and answers the question as well, upvoted! – Ali Feb 19 '14 at 13:42
  • In case you are interested: [I cannot reproduce R. Martinho Fernandes' timings](http://stackoverflow.com/questions/21636248/should-stdstring-be-returned-by-value-from-a-function-or-by-stdstring-s-a#comment33143280_21637284). I still don't see it proved that the pass by value approach could be faster in this case. – Ali Feb 19 '14 at 17:32
3

For returning a newly created string, I would definitely go with the return-by-value approach. The typical compiler implementation of returning objects by value is having the compiler allocate space for the object in the calling function, and passing it a pointer to that allocated space, which is essentially the same as your reference parameter, but with one important difference: The pass-by-reference output parameter needs that the reference to a fully constructed string is passed into the function that gets overwritten by the results, while in the return-by-value case, the function constructs the object itself.

Note that there is one specific use case in which the pass-by-reference solution is faster: If a caller calls this function repeatedly to change the same variable, the overwrite inside the function is exactly what is needed, while returning and assigning in the caller would cause the result to be constructed in a temporary which gets (move) assigned to the variable on the caller side. If you use pre-C++11 compilers, it even gets copy-assigned.

Michael Karcher
  • 3,803
  • 1
  • 14
  • 25
  • Thanks Michael for the answer. It think it is consistent with Ali's answer. I accepted his, as it provides detailed code. I hope it is ok. – Ondřej Čertík Feb 07 '14 at 21:14