How to implement Copy-on-Write?

Question

I want to implement a copy-on-write on my custom C++ String class, and I wonder how to.

I tried to implement some options, but they all turned out very inefficient.

What is your memory allocation strategy ? You may want to rely on Pool Allocation for better performance there. — Matthieu M., Oct 30 '09 at 12:47
I hope this is just for learning. There are so many pitfalls to getting this to work correctly in all situations. — Martin York, Oct 30 '09 at 16:41

score 18 · Answer 1 · edited May 25 '12 at 13:30

18

In a multi-threaded environemnt (which is most of them nowadays) CoW is frequently a huge performance hit rather than a gain. And with careful use of const references, it's not much of a performance gain even in a single threaded environment.

This old DDJ article explains just how bad CoW can be in a multithreaded environment, even if there's only one thread.

Additionally, as other people have pointed out, CoW strings are really tricky to implement, and it's easy to make mistakes. That coupled with their poor performance in threading situations makes me really question their usefulness in general. This becomes even more true once you start using C++11 move construction and move assignment.

But, to answer your question....

Here are a couple of implementation techniques that may help with performance.

First, store the length in the string itself. The length is accessed quite frequently and eliminating the pointer dereference would probably help. I would, just for consistency put the allocated length there too. This will cost you in terms of your string objects being a bit bigger, but the overhead there in space and copying time is very small, especially since these values will then become easier for the compiler to play interesting optimization tricks with.

This leaves you with a string class that looks like this:

class MyString {
   ...
 private:
   class Buf {
      ...
    private:
      ::std::size_t refct_;
      char *data_;
   };

   ::std::size_t len_;
   ::std::size_t alloclen_;
   Buf *data_;
};

Now, there are further optimizations you can perform. The Buf class there looks like it doesn't really contain or do much, and this is true. Additionally, it requires allocating both an instance of Buf and a buffer to hold the characters. This seems rather wasteful. So, we'll turn to a common C implementation technique, stretchy buffers:

class MyString {
   ...
 private:
   struct Buf {
      ::std::size_t refct_;
      char data_[1];
   };

   void resizeBufTo(::std::size_t newsize);
   void dereferenceBuf();

   ::std::size_t len_;
   ::std::size_t alloclen_;
   Buf *data_;
};

void MyString::resizeBufTo(::std::size_t newsize)
{
   assert((data_ == 0) || (data_->refct_ == 1));
   if (newsize != 0) {
      // Yes, I'm using C's allocation functions on purpose.
      // C++'s new is a poor match for stretchy buffers.
      Buf *newbuf = ::std::realloc(data_, sizeof(*newbuf) + (newsize - 1));
      if (newbuf == 0) {
         throw ::std::bad_alloc();
      } else {
         data_ = newbuf_;
      }
   } else { // newsize is 0
      if (data_ != 0) {
         ::std::free(data_);
         data_ = 0;
      }
   }
   alloclen_ = newsize;
}

When you do things this way, you can then treat data_->data_ as if it contained alloclen_ bytes instead of just 1.

Keep in mind that in all of these cases you will have to make sure that you either never ever use this in a multi-threaded environment, or that you make sure that refct_ is a type that you have both an atomic increment, and an atomic decrement and test instruction for.

There is an even more advanced optimization technique that involves using a union to store short strings right inside the bits of data that you would use to describe a longer string. But that's even more complex, and I don't think I will feel inclined to edit this to put a simplified example here later, but you never can tell.

edited May 25 '12 at 13:30

daramarak

6,115
1
31
50

answered Oct 30 '09 at 10:34

Omnifarious

54,333
19
131
194

Got a link for that analysis? I've heard similar claims, and I've always wondered about the details. – Adrian McCarthy Oct 30 '09 at 15:30
5

I argue that CoW is a huge benefit to programmer productivity on any platform. You no longer need to deal with explicit sharing, take care of references, make sure you copy when needed. Practically all modern platforms support fast atomic operations and CoW is a helluva lot cheaper than doing a deep copy every time (as Herb wants). Your poor performance argument is moot IMO. – CMircea Mar 08 '10 at 23:02
@iconiK, show me numbers and we'll talk. My argument is based on actual numbers resulting from empirical tests, not handwaving about the speed of atomic operations and bald assertions that deep copying is more expensive. Atomic operations require memory barriers to implement, and those can be quite expensive. I want to see numbers showing that deep copying is more expensive than atomic reference counts before I alter my stance. – Omnifarious Mar 09 '10 at 06:45
1

From what I can tell, the copy-on-write implementations mentioned work by checking whether there presently exists exactly one reference to a string, an operation with many intricacies. If wrappers don't have to be thread-safe but the wrapped strings do, what would be the effects of having each wrapper hold a pointer to an mutable version of a string and a pointer to an immutable version, with the proviso that either exactly one pointer would be non-null, or they would point to identical strings? The mutable pointer, if non-null, would be the only reference anywhere to the string... – supercat Oct 20 '11 at 19:31
...while the immutable pointer would never be used in such a way as to allow it to be written. To clone a string whose Immutable pointer is null, AtomicExchange the Mutable pointer with null and store it in Immutable pointer. Otherwise just copy the Immutable pointer. To mutate a string whose Mutable pointer is null, store in MutablePointer a reference to a copy of the string in ImmutablePointer. Then, to mutate a string whose Mutable pointer is (perhaps newly) non-null, AtomicExchange the Mutable pointer to null, do the change, set the pointer back, and invalidate the Immutable pointer. – supercat Oct 20 '11 at 19:36
If one wanted to avoid interlocked operations on mutations, one could make the first attempt to duplicate a wrapper generate a new string (rather than using AtomicExchange to swap out the Mutable pointer and store its old value into ImmutablePointer). That would cause some extra copy operations, but eliminate some atomic operations. – supercat Oct 20 '11 at 19:39
@supercat: I would have to think about this. What if a reference to the string was shared between threads? It seems an awful lot to me like the immutable pointer in the 'mutable to immutable' example is an example of double-checked locking if two threads want to make a copy of it at nearly the same time. – Omnifarious Oct 20 '11 at 23:15
@Omnifarious: The level of locking and interlocking required depends upon what type of protection one wants to provide if multiple threads manipulate the same *logical* object simultaneously. The optimal balance between speed and semantics would be to say that (1) simultaneous reads to a logical object whose last access was a mutation may make one or two separate immutable copies; (2) mutations to a logical object performed simultaneous to other reads or writes may corrupt that logical object, but no logical object should be visibly affected by actions (legal or not) performed on another. – supercat Mar 22 '12 at 16:59
@Omnifarious: I was sketching out the above idea a little more, and I think that if one wanted to do something like a copy-on-write array, one could have the whole thing be thread-safe without locking, and with only one CompareExchange on the first mutation following a "copy". In the absence of memory barriers, a thread which performs a read following another thread's write might see old or new data, or a combination thereof; if different threads access different parts of the array, though, there should be no problem. – supercat Jun 05 '12 at 16:15
@Omnifarious: Such a structure would thus seem to avoid the multi-processor bottlenecks associated with more "aggressive" copy-on-write approaches. If one uses separate "Clone" and "CloneAsMutable" methods, one could avoid redundant copy operations in most cases (both methods would have the same semantics, but the former would generate a redundant copy if the destination gets mutated without `Clone` being called on it first, and the latter would perform a redundant copy if the destination never gets mutated. If one can guess what will be done with the clone, one can avoid a redundant copy). – supercat Jun 05 '12 at 16:20
@supercat: I'm glad you're tackling this issue, and your approach looks like it would be interesting in certain contexts. And a new approach to copy-on-write semantics in a multiprocessor environment would certainly be welcome. Personally, since I have to think really hard to even try to follow what you're talking about, I think the approach of simply making things immutable and having to make an explicit mutable copy is the way to go. But I think these cogitations (while interesting and worthwhile) are a little outside the scope of the original question. – Omnifarious Jun 05 '12 at 17:52

score 5 · Answer 2 · answered Jun 05 '12 at 19:45

I would suggest that if one wants to implement copy-on-write efficiently (for strings or whatever), one should define a wrapper type which will behave as a mutable string, and which will hold both a nullable reference to a mutable string (no other reference to that item will ever exist) and a nullable reference to an "immutable" string (references to which will never exist outside things that won't try to mutate it). Wrappers will always be created with at least one of those references non-null; once the mutable-item reference is ever set to a non-null value (during or after construction) it will forever refer to the same target. Any time both references are non-null, the immutable-item reference will point to a copy of the item that was made some time after the most recent completed mutation (during a mutation, the immutable-item reference may or may not hold a reference to a pre-mutation value).

To read an object, check whether the "mutable-item" reference is non-null. If so, use it. Otherwise, check whether the "immutable-item" reference is non-null. If so, use it. Otherwise, use the "mutable item" reference (which by now will be non-null).

To mutate an object, check whether the "mutable-item" reference is non-null. If not, copy the target of the "immutable item" reference and CompareExchange a reference to the new object into the "mutable item" reference. Then mutate the target of the "mutable item" reference and invalidate the "immutable item" reference.

To clone an object, if the clone is expected to be cloned again before it is mutated, retrieve the value of the "immutable-item" reference. If it is null, make a copy of the "mutable item" target and CompareExchange a reference to that new object into the immutable-item reference. Then create a new wrapper whose "mutable-item" reference is null, and whose "immutable-item" reference is either the retrieved value (if it wasn't null) or the new item (if it was).

To clone an object, if the clone is expected to be mutated before it is cloned, retrieve the value of the "immutable-item" reference. If null, retrieve the "mutable-item" reference. Copy the target of whichever reference was retrieved and create a new wrapper whose "mutable-item" reference points to the new copy, and whose "immutable-item" reference is null.

The two cloning methods will be semantically identical, but picking the wrong one for a given situation will result in an extra copy operation. If one consistently chooses the correct copy operation, one will get most of the benefit of an "aggressive" copy-on-write approach, but with far less threading overhead. Every data holding object (e.g. string) will either be unshared mutable or shared immutable, and no object will ever switch between those states. Consequently, one could if desired eliminate all "threading/synchronization overhead" (replacing the CompareExchange operations with straight stores) provided that no wrapper object is used in more than one thread simultaneously. Two wrapper objects might hold references to the same immutable data holder, but they could be oblivious to each others' existence.

Note that a few more copy operations may be required when using this approach than when using an "aggressive" approach. For example, if a new wrapper is created with a new string, and that wrapper is mutated, and copied six times, the original wrapper would hold references to the original string holder and an immutable one holding a copy of the data. The six copied wrappers would just hold a reference to the immutable string (two strings total, although if the original string were never mutated after the copy was made, an aggressive implementation could get by with one). If the original wrapper were mutated, along with five of the six copies, then all but one of the references to the immutable string would get invalidated. At that point, if the sixth wrapper copy were mutated, an aggressive copy-on-write implementation might realize that it held the only reference to its string, and thus decide a copy was unnecessary. The implementation I describe, however, would create a new mutable copy and abandon the immutable one. Despite the fact that there are some extra copy operations, however, the reduction in threading overhead should in most cases more than offset the cost. If the majority of logical copies that are produced are never mutated, this approach may be more efficient than always making copies of strings.

score 3 · Answer 3 · answered Oct 30 '09 at 10:36

3

There's not much to CoW. Basically, you copy when you want to change it, and let anyone who doesn't want to change it keep the reference to the old instance. You'll need reference counting to keep track of who's still referencing the object, and since you're creating a new copy, you need to decrease the count on the 'old' instance. A shortcut would be to not make a copy when that count is one (you're the only reference).

Other than that, there's not much that can be said, unless there's a specific problem you're facing.

answered Oct 30 '09 at 10:36

falstro

34,597
9
72
86

3

The devil is in the details: how do you approach operator []? Do you return a char& and always copy, assuming it will change? Do you return char and never copy, forbidding modification of separate characters? Do you return a proxy object and copy on assignment? So many questions, and not a single correct answer :) – sbk Oct 30 '09 at 10:49
@sbk: easy answer? don't use it. :) for example, you could have get/set methods for single character manipulations. – falstro Oct 30 '09 at 12:15
@roe: But that would be one crippled string class... I remember how disgusted I was when I saw Java's charAt method on strings. Yuck – sbk Oct 30 '09 at 13:26
@sbk: may be, but giving someone a reference/pointer to the internals is going to hurt you either way. Even if you copy now, someone else might get a read-reference to your object at a later stage. No-one an be allowed to interact with anything but the string object. You could work around it though by implementing a character-pointer-object (returned by the [] operator) which will do the copy in the assignment operator. That's actually a pretty interesting idea, thanks! – falstro Oct 30 '09 at 13:46
@roe: I'm glad something was born in this discussion :) Anyway, in my opinion CoW is just not worth the extra complexity. I'd rather go for the immutable string + string builder combination as in .Net/Java/Python and pretty much any other "modern" language. – sbk Oct 30 '09 at 14:12

score 1 · Answer 4 · answered Oct 30 '09 at 10:51

You may want to emulate the 'immutable' string that other languages have (Python, C# as far as I know).

The idea is that each string is immutable, thus any work on a string create a new immutable one... or that's the basic idea, to avoid explosion, you would need not to create another if there is a similar one.

score 0 · Answer 5 · answered Feb 28 '12 at 17:25

0

template <class T> struct cow {
  typedef boost::shared_ptr<T> ptr_t;
  ptr_t _data;

  ptr_t get()
  {
    return boost::atomic_load(&_data);
  }

  void set(ptr_t const& data)
  {
    boost::atomic_store(&_data, data);
  }
}

answered Feb 28 '12 at 17:25

bobah

18,364
2
37
70

@daramarak cow::set() releases a reference to the old data without touching it, if no one else is having a reference to the old data it via having called cow::get() before, the data gets deleted. Think of how cow works. – bobah May 25 '12 at 13:47
Usually with cow you want a single cow object that is unshared being repeatly modified to not create new objects or do allocations. – Yakk - Adam Nevraumont Oct 14 '17 at 04:12
@Yakk - it’s a different pattern called copy on read or simply conflation – bobah Oct 14 '17 at 06:48

How to implement Copy-on-Write?

5 Answers5

Linked