1

Is it bad practice to return a vector element by reference?

class X{
vector<Y> v;

public:
    Y& getRefFromVectorOfY(unsigned int index){
        retrun v.at(index);
    }
    void addToVectorOfY(Y y){
        v.push_back(move(y));
    }
};

While this is efficient and clean, the issue is that it breaks encapsulation. If v is a vector, and a private member of a class, the caller will now have a ref to an element in the private vector that they can not only read from but also assign to (overwrite).

Can do this

x.getRefFromVectorOfY(0).setNumber(7);  //not bad..actually good

or

x.getRefFromVectorOfY(0) = move(Y2);   //very bad!

or even

Y3 = move(x.getRefFromVectorOfY(0));   //OMG

Of course, we may return a const & instead, allowing only const operations.

Returning an element by value is less efficient since it is a copy.

If the get method was to move the element out of the vector to return by value, the vector would lose data integrity since it would no longer store the data that was moved out (element would be reset to default state inside the vector... so... still left in the vector but as junk data).

WHY AM I ASKING THIS? THE PERSPECTIVE

If you're following the rule of zero, and you have a private vector member in a class, you need convenience methods to axs and set the vector. This raises the question how to return values.

The problem with return by ref is that it breaks encapsulation unless it is const ref.

So, should I only return a const ref forcing any setting of values to use the setVector method? The problem with this is that sometimes you store custom type...like Y..in the vector and you need to access Y's non-const methods. While a const & protects the vector element from being re-set or overwritten in the vector, it also prevents caller from using the returned elements non-const methods.

So if I only return const &... I can't do

x.getRefFromVectorOfY(0).setNumber(7);  //Y has a setNumber method

and that would be a problem. It would be a problem because I don't want to re-implement any of Y's methods in class X. That would be another layer of indirection and a lot of code redundancy. This speaks against having a return a const & general policy. Not practical in many cases.

So I would have to override based on constness of return value, which you can't do.

So I would have to have two get functions with diff names, one returning const &... and other just returning a &. That feels dirty.

const Y& getConstRefFromVectorOfY(unsigned int index){
    retrun v.at(index);
}
Y& getRefFromVectorOfY(unsigned int index){
    retrun v.at(index);
}

EDIT

I added this table to summarize the issues and the goals.

I want the caller of class X be able to do the following safely and efficiently:

  • MOD VECTOR
  • MOD ELEMENT IN VECTOR
  • READ ELEMENT IN VECTOR

Summery of options as I understand them:

"-" is a con and "+" is a pro.

OPTION 1 -- Return copy of vector element

    MOD VECTOR
        + Can mod vector via class method addToVectorOfY
    MOD ELEMENT IN VECTOR
        - No way to mod Y element itself in vector without 
          indirection & redundancy (re-implementation of some Y methods)!!!  
          Because modifying returned copy does not modify what is in vector.
    READ ELEMENT IN VECTOR
        + Yes, inefficiently can read element data by looking at copy of element

OPTION 2 -- Return ref of vector element

   MOD VECTOR
        - Can mod vector in non-obious ways via returned ref to element. Breaks encapsulation.
          Dangerous and also redundant because class has setter method to mod vector.
        + Can mod vector via class method addToVectorOfY
    MOD ELEMENT IN VECTOR
        + Can call non-const methods of element returned
        - Some danger to the reference being invalid upon vector resize
    READ ELEMENT IN VECTOR
        + Yes, with maximum efficiency
        - Some danger to the reference being invalid upon vector resize

OPTION 3 -- Return const ref of vector element

   MOD VECTOR
        + Can mod vector via class method addToVectorOfY
    MOD ELEMENT IN VECTOR
        - No way to mod Y element itself in vector without 
          indirection & redundancy (re-implementation of some Y methods)!!! 
    READ ELEMENT IN VECTOR
        + Yes, with maximum efficiency
        - Some danger to the reference being invalid upon vector resize 

Is there a way for a user of class X to do all 3 safely and efficiently?

code
  • 1,056
  • 10
  • 22
  • 2
    I'm not sure I see where the rule of 0/3/5 comes into play here. – François Andrieux Mar 05 '18 at 19:40
  • 1
    Returning a reference forces you to have a member of the specified type, and the type of members is an implementation detail. That's why returning by reference (or const reference) breaks encapsulation. It's a tradeoff (returning references is "free" but breaks encapsulation). You could always use a [span](https://stackoverflow.com/q/45723819/1896169) – Justin Mar 05 '18 at 19:44
  • Fair point francois. – code Mar 05 '18 at 19:45
  • 2
    Returning a reference seems very dangerous; if calling code holds on to that reference and then the vector is resized, the reference will be invalidated and the caller will have no way of knowing. – 0x5453 Mar 05 '18 at 19:48
  • 0x5453, that is a great point. – code Mar 05 '18 at 19:50
  • 2
    "Encapsulation" doesn't mean "hide all data in all ways". – Lightness Races in Orbit Mar 05 '18 at 20:05
  • Not enough context to say for sure, but another option could be to make the vector public. – juanchopanza Mar 05 '18 at 20:08
  • Also whether you return be value or reference should be determined by the desired semantics, not by efficiency arguments. – juanchopanza Mar 05 '18 at 20:10

2 Answers2

9

Well, std::vector's own operator[] returns a non-const reference, so it's not bad practice in itself. Specifically, this doesn't "break encapsulation" - a vector is not supposed to encapsulate away its members, it's supposed to provide access to them.

Also, all of your 3 examples are fine, not just the first one. It's not "very bad" nor "OMG".

Having said that, remember that an std::vector's storage gets re-allocated as its size changes (at least - during size increases), so it doesn't guarantee the validity of pointers/references into it.

So what's bad is to keep references (and pointers, and opaque iterators) into an std::vector around when it might get resized.

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • With that in mind, then most (all?) of the standard containers follow this practice – Daniel Mar 05 '18 at 19:57
  • @Daniel: The API makes it very clear when this can happen. The same may not be true for OPs class, what with the additional level of indirection to the storage. – AndyG Mar 05 '18 at 20:02
  • Hi, thanks. So, I'm not sure what you're recommending. I edited the OP to make options clear. I added a table to summarize trade-offs as I understand them. Are you suggesting that option 2 is a fine choice? – code Mar 05 '18 at 23:01
  • @code: Use any one of the options - decide on a case-by-case basis. When returning a value, remember there's [RVO](https://en.wikipedia.org/wiki/Copy_elision#Return_value_optimization), so you might not actually be paying the price of making a copy. – einpoklum Mar 05 '18 at 23:11
  • NRVO is only for temps where the instantiation is on the same line as the return statement. It doesn't apply to case above, for stored items in a member vector. I'm not overly worried about the performance loss of a copy... but I option 1 has a huge con of not being able to modify element in vector. Because modification modifies the returned copy... and not the item in the vector. – code Mar 05 '18 at 23:19
  • @code: Fine, then - option 2. Nothing to be scared about. – einpoklum Mar 05 '18 at 23:35
1

Is it bad practice to return a vector element by reference?

Yes, in general? In most cases, I would think if you took the time to encapsulate a member variable of a class as private in order to control the access and manipulation of that variable, designing a member function that will easily break that control renders the first step moot. This may not always be true depending on the use case, but here you pose the problem so abstractly that it is hard to give a specific answer. The only real problem I can identify in your post is this:

Returning an element by value is less efficient and since it is a copy, it is less efficient.

I guess the real question to ask here is there a meaningful, measurable performance difference between maintaining greater access control to the member variable versus having more direct access to the underlying memory so you can manipulate it faster? You are right that the return by reference is more efficient in some ways, but does that actually make a practical difference in your particular code?

Additionally, it also matters what level of data integrity you need to maintain for the private member variables you are exposing. einpoklum makes a great point that many standard containers follow this paradigm. They have no expectations on the values that are stored in the container, only that they maintain control over allocation/deletion of the memory held by them. Your class may have a stronger control requirements about what values the member values take. For example, if all the data elements in that vector needed to be non-negative, then by exposing a reference to that memory you lose the ability to have the class make those kind of data integrity guarantees. It really just depends on the requirements, although I prefer the paradigm of selectively releasing control over a member variable as needed rather than giving full access and slowly taking it away when you want to add additional guarantees.

Daniel
  • 1,291
  • 6
  • 15