18

I have seen the copy-and-swap idiom recommended in various places as the recommended/best/only way to implement strong exception safety for the assignment operator. It seems to me that this approach also has a downside.

Consider the following simplified vector-like class which utilizes copy-and-swap:

class IntVec {
  size_t size;
  int* vec;
public:
  IntVec()
    : size(0),
      vec(0)
  {}
  IntVec(IntVec const& other)
    : size(other.size),
      vec(size? new int[size] : 0)
  {
    std::copy(other.vec, other.vec + size, vec);
  }

  void swap(IntVec& other) {
    using std::swap;
    swap(size, other.size);
    swap(vec, other.vec);
  }

  IntVec& operator=(IntVec that) {
    swap(that);
    return *this;
  }

  //~IntVec() and other functions ...
}

Implementing the assignment via the copy constructor may be efficient and guarantee exception safety, but it can also cause an unneeded allocation, potentially even causing an uncalled for out-of-memory error.

Consider the case of assigning a 700MB IntVec to a 1GB IntVec on a machine with a <2GB heap limit. An optimal assignment will realize it already has enough memory allocated and only copy the data into it's already allocated buffer. The copy-and-swap implementation will cause an allocation of another 700MB buffer before the 1GB one is released, causing all 3 to try co-exist in memory at once, which will throw an out-of-memory error needlessly.

This implementation would solve the problem:

IntVec& operator=(IntVec const& that) {
  if(that.size <= size) {
    size = that.size;
    std::copy(that.vec, that.vec + that.size, vec);
  } else
    swap(IntVec(that));
  return *this;
}

So the bottom line is:
Am I right that this is a problem with the copy-and-swap idiom, or do normal compiler optimizations somehow eliminate the extra allocation, or am I overlooking some problem with my "better" version that the copy-and-swap one solves, or am I doing my math/algorithms wrong and the problem doesn't really exist?

Community
  • 1
  • 1
Baruch
  • 20,590
  • 28
  • 126
  • 201
  • 1
    Personally I basically don't use copy and swap because, as you point out, it's virtually never the most efficient technique... and we use c++ for efficiency. – David May 28 '14 at 17:50
  • 2
    Don't write programs that copy 700 megabyte vectors. At least not 700 megabyte vectors of things that have constructors and destructors and can throw exceptions. – Kaz May 28 '14 at 18:31
  • @Dave: Copy and swap has one extra move over optimal, that the compiler will probably optimize out. The only time that ought to be an issue is if your move is heavyweight (like `std::array`) – Mooing Duck May 28 '14 at 23:02
  • Copy and swap is also recommended because it's only 3 lines of code, and it's relatively copy-paste. Less chance of an error. Optimized code requires significantly more code. – Mooing Duck May 28 '14 at 23:08

6 Answers6

15

There are two problems with the implementation reusing the space

  • If you're assigning very_huge_vect = very_small_vect; the extra memory will not be released. This may be what you want or may be not.

  • In case of integers all is fine, but what about objects for which the copy operation may throw an exception? You'll end up with a messed up array where part of the copy has been done and that has been truncated. Much better would be to leave the target untouched if the copy operation fails (what the swap idiom does).

By the way, as a general rule, in very few cases you can find anything that looks like "always the best solution". If you're looking for a silver bullet, programming is not going to be the right place.

6502
  • 112,025
  • 15
  • 165
  • 265
12

To fix your particular problem, modify copy-swap to clear-copy-swap.

This can be made somewhat generic via:

Foo& operator=( Foo const& o ) {
  using std::swap;
  if (this == &o) return *this; // efficient self assign does nothing
  swap( *this, Foo{} ); // generic clear
  Foo tmp = o; // copy to a temporary
  swap( *this, tmp ); // swap temporary state into us
  return *this;
}
Foo& operator=( Foo && o ) {
  using std::swap;
  if (this == &o) return *this; // efficient self assign does nothing
  swap( *this, Foo{} ); // generic clear
  Foo tmp = std::move(o); // move to a temporary
  swap( *this, tmp ); // swap temporary state into us
  return *this;
}

while this does cause that large allocation to happen, it happens immediately after the large deallocation occurs.

The key part of copy-swap is that it makes a correct implementation, and getting exception-safe copy right is tricky.

You'll note that the above, if an exception is thrown, results that the lhs could be in an empty state. In comparison, the standard copy-swap will result in a valid copy, or the lhs being unchanged.

Now, there is one final little trick we can do. Suppose our state is captured in a vector of substates, and those substates have exception-safe swap or move. Then we can:

Foo& move_substates( Foo && o ) {
  if (this == &o) return *this; // efficient self assign does nothing
  substates.resize( o.substates.size() );
  for ( unsigned i = 0; i < substates.size(); ++i) {
    substates[i] = std::move( o.substates[i] );
  }
  return *this;
}

which emulates your copy-contents, but does it by move instead of copy. We can then exploit this in our operator=:

Foo& operator=( Foo && o ) {
  using std::swap;
  if (this == &o) return *this; // efficient self assign does nothing
  if (substates.capacity() >= o.substates.size() && substates.capacity() <= o.substates.size()*2) {
    return move_substates(std::move(o));
  } else {
    swap( *this, Foo{} ); // generic clear
    Foo tmp = std::move(o); // move to a temporary
    swap( *this, tmp ); // swap temporary state into us
    return *this;
  }
}

and now we reuse our internal memory and avoid an allocation if we are moving from a source, and we aren't too much larger than the source, if you are afraid of memory allocations.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • As an aside, `using std::swap; swap(a,b);` is one of the better ways to swap, as it enables ADL for `swap`. You can then have a free `swap` function in the same namespace as your class, and the above will call it: failing that, it calls `std::swap`. – Yakk - Adam Nevraumont May 28 '14 at 19:17
  • How is this better than my implementation? – Baruch May 28 '14 at 20:50
  • @baruch: It's better if copying the data can throw an exception, because your implementation leaves the target in an undefined state, which will probably crash, wheras this one simply leaves the "target" in a valid "empty" state. – Mooing Duck May 28 '14 at 23:05
  • @baruch the point is that `swap` is easy to implement in an exception-safe manner usually. And copy construction/move construction needs to be exception safe already (that takes some care!). So with each step being easy to or having to be already exception-safe, we never end up in a corrupted state. It isn't as good as the pure copy-swap state's exception guarantee, where the operation either completes **or** state remains the way it was before, but it at least isn't a real mess. Finally, I don't waste space. – Yakk - Adam Nevraumont May 29 '14 at 13:44
11

Yes, running out of memory is a potential problem.

You already know what problem copy and swap solves. It is how you "undo" a failed assignment.

With your more efficient method there is no way to go back if the assignment fails at some stage in the process. And, if an object is badly written a failed assignment might even leave the object corrupt and the program will crash during object destruction.

Zan Lynx
  • 53,022
  • 10
  • 79
  • 131
5

Am I right that this is a problem with the copy-and-swap idiom

It depends; if you do have so large vectors, then yes you're right.

am I overlooking some problem with my "better" version that the copy-and-swap one solves

  1. You're optimizing for the rare case. Why perform extra checks and add cyclomatic complexity to your code ? (unless ofcourse, having soo big vectors is the norm in your application)

  2. Since C++11 the r-valuness of temporaries can be harvested by c++. So passing your argument by const& throws away optimizations.

Bottom line: The word always is easy to disprove. If there was a universal solution, always better than any alternative, I suppose the compiler could adopt it and implicitly generate a default assignment operator based on this

On the previous note, the implicitly-declared copy assignment operator has the form

T& T::operator=(const T&);

accepting its argument by const& and not by value (as in the copy and swap idiom)

Nikos Athanasiou
  • 29,616
  • 15
  • 87
  • 153
1

Two caveats with your approach.

  1. Consider the case of assigning a 1KB IntVec to a 1GB IntVec. You'll end up with a massive chunk of allocated but unused (wasted) memory.
  2. What if an exception is thrown during copy? If you're overwriting the existing location, you end up with corrrupted, partially-copied data.

As you pointed out, getting around these issues may not be very memory efficient, but software design is always about tradeoffs.

Unsigned
  • 9,640
  • 4
  • 43
  • 72
0

You can see how STL implements operator= of vector.

template <class _Tp, class _Alloc>
vector<_Tp,_Alloc>& 
vector<_Tp,_Alloc>::operator=(const vector<_Tp, _Alloc>& __x)
{
  if (&__x != this) {
    const size_type __xlen = __x.size();
    if (__xlen > capacity()) {
      iterator __tmp = _M_allocate_and_copy(__xlen, __x.begin(), __x.end());
      destroy(_M_start, _M_finish);
      _M_deallocate(_M_start, _M_end_of_storage - _M_start);
      _M_start = __tmp;
      _M_end_of_storage = _M_start + __xlen;
    }
    else if (size() >= __xlen) {
      iterator __i = copy(__x.begin(), __x.end(), begin());
      destroy(__i, _M_finish);
    }
    else {
      copy(__x.begin(), __x.begin() + size(), _M_start);
      uninitialized_copy(__x.begin() + size(), __x.end(), _M_finish);
    }
    _M_finish = _M_start + __xlen;
  }
  return *this;
}
eric
  • 1
  • 1