Efficiency in a C++ function?

Question

This is probably a simple question, but this came across my mind. It is regarding the difference between the two functions below:

T func_one(T obj) {      //for the purpose of this question, 
    return obj + obj;    //T is a large object and has an overloaded '+' operator 
}

T func_two(T obj) {
    T output = obj + obj;
    return output;
}

In func_one(), rather than creating an object T, assigning it a value and then returning the object, I just return the value itself without creating a new object. If T was a large object, would func_one() be more efficient than func_two() or does func_one() make an object T anyways when returning the sum of the two objects?

You shouldn't be adding and returning large objects like this. — Mikhail, Nov 26 '13 at 02:29
Any modern optimizing compiler will generate similar assembly for both. Also, with RVO and move semantics, this may very well be the most efficient way of accomplishing this. The only way to know, is to profile. — Chad, Nov 26 '13 at 03:19

score 0 · Answer 1 · answered Nov 26 '13 at 02:27

The compiler would optimize away fund_two into something similar to func_one which would then be optimized to something else, long story short, you need not to worry about this, unless you really do need to worry about this, then in that case you can look at the asm output.

score 0 · Answer 2 · answered Dec 30 '13 at 10:49

Short answer: We can't know

Long answer: it depends highly on how T works and your compilers support for return value optimization.

Any function which returns by value can have RVO or NRVO optimization applied to it. This means that it will construct the return value directly into the calling function, eliminating the copy constructor. As this is the problem with returning large objects by value, this will mean a substantial gain in performance.

The difference between func_one and func_two is that func_one returns an anonymous temporary value, an r-value; this means RVO can trivially be used. func_two returns a named value, an l-value, so NRVO, a much harder optimization, will be used. However, func_two is trivial, so it will almost certainly have NRVO applied, and both functions will be basically identical.

This is assuming you have a modern or even semi-modern compiler; if not, it will depend highly on how you implemented T.

If T has move semantics, your compiler will instead be able to move rather than copy. This should apply to both functions, as temporaries exist in both; however, as func_two returns a named value, it may not be capable of using move semantics. It's up to the compiler, and if the compiler isn't doing RVO or NRVO, I doubt it's doing move.

Finally, it depends on how + operator and = operator are implemented. If, for example, they were implemented as expression templates, then fun_two still requires an assignment, which will slow it down, where as func_one will simply return a highly optimized temporary.

In Summary In almost all practical contexts, these are identical. In the vanishingly small window where your compiler is acting very strange, func_one is almost universally faster.

score 0 · Answer 3 · edited May 23 '17 at 12:28

Modern compilers can transform the version with the extra variable to the one without (named return value optimization, this is quite a frequent source of questions here on SO, Why isn't the copy-constructor called when returning LOCAL variable for example). So this is not the overhead you should worry about.

The overhead you should worry about, is the function call overhead. An addition takes a modern CPU at most a single cycle. A function call takes between 10 and 20 cycles, depending on the amount of arguments.

I am a bit unsure what you mean with T in your question (is it a template parameter? is it a class? is it a placeholder for a type that you didn't want to disclose in your question?). However, the question whether you have a function call overhead problem depends on that type. And it depends on whether your compiler can inline your function.

Obviously, if it's inlined, you're fine, there's no function call overhead.
If T is a complex type with an expensive operator+() overload, then you are fine as well.
However, if T is int, for instance, and your function is not inlined, then you have roughly 90% overhead in your function.

Efficiency in a C++ function?

3 Answers3