3

I wish to seek a bit of feedback (to help me see if my position on the issue is correct or not) regarding returning by const reference vs. returning by value (for performance reasons).

To elaborate:

#include "BigDataStruct.h"

class DataGiver
{
  public:
    BigDataStruct getByValue() const { return myData; };
    const BigDataStruct& getByConstRef() const { return myData; };

  private:
    BigDataStruct myData;
}

My gut feeling says there may (or may not) be a slight performance increase for getByConstRef() over getByValue() in situations like:

const BigDataStruct& someData = someDataGiver.getByConstRef();

but there won't be (hardly) any for calls like:

BigDataStruct someData = someDataGiver.getByConstRef();

Also - which gives me some discomfort - is that in the first scenario someData is tied to actual member in someDataGiver, in the sense that if the member there changes someData also changes and if someDataGiver expires/dies someData becomes... undefined?

As such I try to discourage my coworkers from returning by const reference (unless we actually WANT a reference to the object for other reasons - e.g. in singleton factories). Do I have a point here or am I merely nitpicking and annoying my coworkers for no good reason?

(I am aware of the rule of thumb for optimization: 1) Don't do it 2) Don't do it yet - and use a profiler before actually doing it)

AMA
  • 4,114
  • 18
  • 32
CharonX
  • 2,130
  • 11
  • 33
  • 4
    As always, *measure, benchmark and profile*. And unless it's a proven bottleneck of your code, remember that "good enough" usually *is* good enough. – Some programmer dude Feb 19 '18 at 13:07
  • Sure, but also consider `someDataGiver.getByConstRef().a_member` or `someOtherFunction(someDataGiver.getByConstRef())`... – aschepler Feb 19 '18 at 13:12
  • 1
    Related: https://stackoverflow.com/questions/33994995/which-is-more-efficient-return-a-value-vs-pass-by-reference – AMA Feb 19 '18 at 13:29
  • 2
    I try to avoid getters because they break encapsulation. The semantics are different. One is a data *giver* ( a copy is *forced*) and the other is a data *viewer* (the caller decides to make a copy or just look at what's there). I think the implementation should be based on that rather than efficiency. I don't think the two options are so easily interchangeable. Sometimes a *copy* is **not** what you want - you need a *reference*. Also there are other options. If the callers need to refer to that object after the main object dies then you can consider a `shared_ptr` rather than a reference. – Galik Feb 19 '18 at 13:29
  • Anyways I think this is too broad because I think there is not a general "one is better than the other" rule here, it depends on the situation so a more concrete example is needed for a definitive answer. – Galik Feb 19 '18 at 13:31
  • Also your wording "slight performance increase" could be understating it for a large object. Forcing copies in some situations could seriously kill performance. And I know people say do not prematurely optimize. But equally you should not write gratuitously sluggish code either. And changing from copy semantics to reference semantics is an architectural change, not an optimization. It is not something you can just slip in if the code is too slow. You have to change how you use it throughout the code base. – Galik Feb 19 '18 at 13:40
  • Here's a [tiny benchmark online](http://quick-bench.com/b681dAtaACw9MKzDlE-AFjqkazU). The take-away message: returning by reference improves the performance (when avoiding the copy). If the call requires a copy after returning by reference it has the same performance as returning by value. – AMA Feb 19 '18 at 13:51

2 Answers2

3

Since it's a feedback request, here are my 2 cents. There are two aspects I would like to focus on: performance and safety.

Performance

Returning by reference provides performance improvement by removing extra-copying. Amount of improvement strongly depends on the usage: how big is the object, how tight is the loop, etc.

To bring in some data: here's a benchmark on-line (note: BigDataStruct holds char[256]): enter image description here

  • Returning by reference has an impact. It's up to you to decide if it is significant on case-to-case basis.

  • When doing BigDataStruct someData = someDataGiver.getByConstRef(); performance is the same

Safety

Returning by reference is not memory-safe and is not thread-safe since it provides aliasing to data (BigDataStruct) that could be potentially modified elsewhere (e.g., in DataGiver).

To illustrate it with an example:

const BigDataStruct& someData = someDataGiver.getByConstRef();
// do something with someData
someDataGiver.modifyData(); //<= someData is now changed!
// doing same things with someData can have different outcome

You might think: no sane developer will write anything like that. But here's one that can happen quite often (happened with me):

const BigDataStruct& someData = 
    DataGiver().getByConstRef(); // Bam! A wild dangling reference appears!

Conclusions

Returning by reference can improve performance. But it has few nasty pitfalls that developer has to keep track of. Whether to discourage it or not is a strong recommendation to make. IMO defaulting to returning-by-value and motivating returning-by-reference could be a balanced approach. Also explicitly stating that the value is returned by reference is a good idea (which your team already does with ByConstRef postfix).

Update: as @LightnessRacesinOrbit mentioned in the comment (also see answer by Pavlo K): if you don't have the control of all the getter usages (e.g., developing a library or having external users) returning by reference provides more flexibility by delegating the choice to the final user.

AMA
  • 4,114
  • 18
  • 32
  • 1
    _"IMO defaulting to returning-by-value and motivating returning-by-reference could be a balanced approach"_ Hard to disagree with this from a whole-application perspective. Though Pavlo is also right that if you have (or may have) "users" who aren't you, providing a little more flexibility may tilt that balance. After all, your users shouldn't pay for what they don't use. – Lightness Races in Orbit Feb 19 '18 at 15:00
  • Thank you for the detailed answer. You, LightnessRacesinOrbit and Pavlo K made very good points. The code in question will not be used by 3rd parties, but I'll keep it in mind if we ever develop code that will. I'll continue to use return by value by default, and return by reference where performance is essential. The potential pitfalls of return by value are too high (especially when the effect would be e.g. saving < 0.01s while displaying a pop-up message to the user) – CharonX Feb 19 '18 at 16:28
  • "Return by value" is slower until you realize that "return by reference" potentially requires a memory allocation syscall, which is orders of magnitude slower than just copying around a few bytes. – étale-cohomology Nov 22 '20 at 03:21
2

IMO this is natural way for c++ to return references and let user of сlass to decide how to deal with it.

For example stl containers usually return elements by reference, and that's up to developer copy it to local variable if needed, or use by reference (if developer fully controls element's lifecicle). (Consider std::vector as an example).

So, if there are no other reasons (like making DataGiver thread-safe, etc), I would return by const ref.

Pavlo K
  • 837
  • 9
  • 15
  • A very good point. And while I will probably continue to use return by value by default (in not-performance critical parts of the application) you do give a good argument. Still, I feel the potential pitfalls (Lifetime of the providing object, issues with thread safety) outweight the benefits in the parts of the application where performance is not critical. – CharonX Feb 19 '18 at 16:20