Returning vector of pointers - understanding

Question

I'm trying to understand the following (Lets pretend MyStorageClass is huge) :

class MyStorageClass
{
public:
  string x;
  string y;
  string z;
};

class storage
{
public:
  storage();
  ~storage()
  {
    vector<MyStorageClass *>::iterator it = myVec.begin(); it != myVec.end(); it++)
    {
      delete *it;
    }

  vector<MyStorageClass*> getItems()
  {
    for(int i = 0; i < 10; ++i)
    { 
      v.push_back(new MyStorageClass());
    }
    return v;
  }

private:
  vector<MyStorageClass*> v;

};



main()
{
  storage s;
  vector<MyStorageClass*> vm = s.getItems();
}

From my understading, when s returns the vector and is assigned to vm this is done as a copy (by value). So if s went out of scope and calls it destructor, vm has its own copy and its structure is not affected. However, passing by value is not efficient. So, if you change it to pass by reference:

vector<MyStorageClass*>& getItems()
{
  for(int i = 0; i < 10; ++i)
  { 
    v.push_back(new MyStorageClass());
  }
  return v;
}

You pass the memory location of v (in class Storage). But you still assign a copy of it using the = operator to the vector vm in the Main class. So, vm is independent of v, and if Storage destructor is called vm unaffected.

Finally, if getItems returned a reference and in main you had the following:

main()
{
  storage s;
  vector<MyStorageClass*> &vm = s.getItems();
}

Now, vm holds the address of v. So it is affected by the Storage destructor.

Question: Is what I've stated above true?

Sounds correct to me. Not an answer as I haven't formally verified. — Simon Richter, Dec 01 '11 at 13:40
`myVec` isn't declared anywhere. Did you mean `v` in `~storage`? — outis, Dec 01 '11 at 13:53

Björn Pollex · Accepted Answer · 2011-12-01T13:46:40.407

Yes, you have understood this correctly. Now, if you want to take it to the next level, start by finding reasons why this is bad:

Returning pointer or references to class internals breaks encapsulation.
The reference you get will become a dangling reference once the corresponding s-object is destroyed, thereby breaking the invariant that references are always valid.
By returning a vector of pointers you leave the caller wondering weather he has to delete those pointers or not.

A better solution would be for storage to expose methods begin and end that return the corresponding iterators from s. Alternatively, you can use the Visitor-pattern for algorithms that need to operate on s.

Also, in your case it seems like the vector s is supposed to own the objects it contains. That would be a good indicator for use of boost::ptr_vector.

+1, for mentioning dangling since I dropped the ball the first time :) — John Humphreys, Dec 01 '11 at 13:50

codeling · Answer 2 · 2011-12-19T14:23:18.577

Even though a vector copies its values, the values in your case are the pointers, and not the objects pointed to. So "vm has it's own copy" is not fully true: The resulting vector in your first piece of code will have a copy of the pointers, but not of the MyStorageClass objects they are pointing to; so, actually in all of your 3 code examples the pointers stored in vm would not be valid anymore if the Storage destructor is called!

If you need to make sure, though, that Storage isn't destroyed before the last access to one of the MyStorageClass objects anyway, then, of the presented methods, the third way would be the preferred one, since the vector data (i.e. the pointers) is in memory only once then. You should consider returning a const vector, however, else every caller of getItems can modify the v vector of the Storage class through the returned reference.

As others have already pointed out, exposing the vector itself might not be that good of an idea in the first place; you might consider using iterators instead or the Visitor pattern.

Furthermore, the usage of pointers - where not absolutely needed - is in bigger projects usually rather frowned upon. Consider using smart pointers, such as std::auto_ptr (not very recommendable though due to strange copy semantics), or the more easily understandable boost::shared_ptr / std::tr1::shared_ptr.

Just as a side note, the first and second code example will (at least in most modern compilers) have the exact same performance, since in the first case, the compiler can optimize the temporary return value away (check out "Return value optimization").

score 0 · Answer 3 · answered Dec 01 '11 at 13:46

I think what you've stated is true, but I'm a little confused on the purpose.

vector<MyStorageClass*> &vm = s.getItems();

gets a vector by reference. But the vector contains pointers, so the vector going out of scope wouldn't cause any destructors to run in the first place - the destructors would only run if these were smart pointers of some kind (and even then it depends).

So, you can happily pass a vector of pointers around by value without it being much of an issue. I'm sure you'll save some efficiency by making it by reference, but I don't think its as severe as you may think.

Moreover, your pointed-to-objects are created with new (dynamically), so you can return the vector by value the same way without any fear of losing the pointed to objects.

So, again, I think your logic is fine, and your reference way is a little more efficient, but I just wanted to make sure that you knew that it would work both ways without a problem :) (and since its a vector of pointers, by value isn't too bad either).

PS: in terms of reference, you do have to worry about dangling which may be more frustrating than you may think as pointed out by Bjorn above. If you've ever used string.c_str(), you may have gotten a taste of this. If you get a .c_str() from a string and the original string goes out of scope, the pointer returned by .c_str() is dangling (points to memory that is no longer used for that purpose) and accessing it results in undefined behavior. So, the by value would probably be the better option (but it does depend on your design - for example, if this is going to be a singleton persisting for the duration of your application, dangling probably isn't an issue).

score 0 · Answer 4 · answered Dec 01 '11 at 16:31

Be aware that your storage class will cause problems if objects of class storage are copied. Since you provide no copy constructor or assignment operator, the default will be used. The default will blindly copy the vector of pointers and now you have two storage objects that will both try to delete the pointers in the vector.

score 0 · Answer 5 · edited May 23 '17 at 12:27

From my understanding, when s returns the vector and is assigned to vm this is done as a copy (by value). So if s went out of scope and calls it destructor, vm has its own copy and its structure is not affected.

Plain pointers (T*) aren't followed by std::vector's copy constructor or assignment operator (though you could use a smart pointer class that would copy). Since what the pointers point to (the "pointed") aren't copied, the copy operation is a shallow copy. While operations on s.v won't affect vm (and vice versa), anything that affects the pointed accessed from one will affect the other, since they're the same.

If you were to store an appropriately implemented smart pointer in s.v rather than MyStorageClass*, the standard std::vector copy operations could produce deep copies, so that the contents of s.v could be altered without affecting the contents of vm in any way (and vice versa). The copy operations of the copy-pointer would use the copy operations of the pointed class to duplicate the pointed object. As a result, every pointed object would be pointed to by only one copy-pointer.

Alternatively (as others have mentioned), you can use a smart pointer that allows for shared ownership and let it manage the pointed objects, doing away with the delete call in ~storage.

However, passing by value is not efficient.

It is, however, correct in some circumstances, specifically when mutating one instance of the container must not affect the other (a case you mention), in which case you can't get away without a copy. Correct trumps efficient. Generally speaking, you can use the copy-swap idiom to cut down on inefficiency due to creation of temporaries. As far as I know, most STL implementations don't use copy-swap for std::vector. Copy-on-write, which will perform a copy only when the vector is changed, can also help. Threading complicates copy-on-write and can introduce inefficiencies.

Now, vm holds the address of v.

vm doesn't hold an address. While references may use pointers under the hood (they also might not; they might not even require additional storage, according to §8.3.2-3 of C++03), at the language level references aren't pointers. Consider that you can have null pointers, but not null references. A more accurate statement is that vm is aliased to v (vm and v refer to the same object), which is why operations on v will affect vm (and vice versa).

Returning vector of pointers - understanding

5 Answers5