21

How do you implement an efficient and thread safe reference counting system on X86 CPUs in the C++ programming language?

I always run into the problem that the critical operations not atomic, and the available X86 Interlock operations are not sufficient for implementing the ref counting system.

The following article covers this topic, but requires special CPU instructions:

http://www.ddj.com/architect/184401888

Fabian
  • 1,824
  • 3
  • 19
  • 35

7 Answers7

12

Nowadays, you can use the Boost/TR1 shared_ptr<> smart pointer to keep your reference counted references.

Works great; no fuss, no muss. The shared_ptr<> class takes care of all the locking needed on the refcount.

Michael Burr
  • 333,147
  • 50
  • 533
  • 760
  • Also note that Technical Report 1 (TR1) is a draft paper of items for inclusion in the C++0x standard. As C++0x comes closer, these will be even better supported as well (they are based on the Boost library). – Kris Kumler Sep 18 '08 at 15:23
  • I have my doubts about the lock-free implementation but this is likely still the best choice. Multi-Thread is fiendishly hard to do in C++ with all the tricks the compiler may do to your code. – Jeroen Dirks Sep 18 '08 at 16:58
  • 11
    Man... Either I'm missing something, or something is wrong with this answer... Why did it get accepted and up-voted so many times, if it doesn't even answer the original question - it just suggests something else that already has a working implementation of the same thing that was in OQ? – Paulius Dec 08 '08 at 11:45
  • @Paulinus - probably the person asking the question (and others) are simply looking for a working implementation that can be readily used. – Michael Burr Dec 08 '08 at 15:24
  • Warning: boost::shared_ptr is not thread-safe on all critical operations. It is not thread-safe on swap for instance. – Catskul Dec 19 '12 at 19:08
  • 1
    @Catskul: That is documented as "shared_ptr objects offer the same level of thread safety as built-in types": http://www.boost.org/doc/libs/1_52_0/libs/smart_ptr/shared_ptr.htm#ThreadSafety. While you cannot modify the same `shared_ptr` instance from multiple threads concurrently, there's no thread-safety problem modifying separate `shared_ptr`'s that share ownership of an object. – Michael Burr Dec 20 '12 at 02:21
  • @Michael-Burr That's not the same as being atomic for all critical operations though. For atomicity to be even relevant there has to be a point at which one of those shared_ptrs are used in multiple threads. Some operations conceivably used in those cases are not atomic and so there needs the caveat. – Catskul Dec 20 '12 at 03:23
  • 1
    @Catskul: the use-case for shared pointers is to have different instances of shared pointers that point to the same object. Modifying those shared pointers in different threads is thread safe (though the same isn't necessarily true for using the object pointed to from multiple threads). Shared pointers solve the problem of managing the lifetime of a shared object. They don't solve all multithreading issues, including modification of a particular instance of a shared pointer from multiple threads. – Michael Burr Dec 20 '12 at 05:03
  • @Michael-Burr right. I think we're saying the same thing: they're somewhat, but not completely thread-safe. That fact that some of the operations are not thread-safe, though, should be in your answer, because otherwise it's misleading and will cause people to accidentally mis-use them. I know this first hand because I trusted another similar answer elsewhere, and got burned for not reading the docs first hand. – Catskul Dec 20 '12 at 07:01
6

In VC++, you can use _InterlockedCompareExchange.

do
   read the count
   perform mathematical operation
   interlockedcompareexchange( destination, updated count, old count)
until the interlockedcompareexchange returns the success code.

On other platforms/compilers, use the appropriate intrinsic for the LOCK CMPXCHG instruction that MS's _InterlockedCompareExchange exposes.

moonshadow
  • 86,889
  • 7
  • 82
  • 122
  • 1
    Or simply InterlockedIncrement/Decrement. – yrp Sep 18 '08 at 14:47
  • @yrp: is it faster than a do while loop/compare and swap? AFAIK, IntelockedIncrement/Decrement emit LOCK INC/LOCK DEC x86 instructions, while cas does not (at least LOCK is implicit for this instruction). – elmattic Nov 23 '11 at 15:16
3

Strictly speaking, you'll need to wait until C++0x to be able to write thread-safe code in pure C++.

For now, you can use Posix, or create your own platform independent wrappers around compare and swap and/or interlocked increment/decrement.

James Hopkin
  • 13,797
  • 1
  • 42
  • 71
  • 1
    Or do like everyone else and use Boost::TR1 – gnud Dec 08 '08 at 11:08
  • If you don't care about Win32, Posix is the best choice (fast and simple), IMO. Of course, nowadays main compilers support C++0x, which means you can use TR1 (which is a part of C++11 standard). You just have to turn the option on (C++0x compiler flag). – Pijusn Jan 26 '12 at 17:36
2

Win32 InterlockedIncrementAcquire and InterlockedDecrementRelease (if you want to be safe and care about platforms with possible reordering, hence you need to issue memory barriers at the same time) or InterlockedIncrement and InterlockedDecrement (if you are sure you will stay x86), are atomic and will do the job.

That said, Boost/TR1 shared_ptr<> will handle all of this for you, therefore unless you need to implement it on your own, you will probably do the best to stick to it.

Suma
  • 33,181
  • 16
  • 123
  • 191
1

Bear in mind that the locking is very expensive, and it happens every time you hand objects around between smart pointers - even when the object is currently owned by one thread (the smart pointer library doesn't know that).

Given this, there may be a rule of thumb applicable here (I'm happy to be corrected!)

If the follow things apply to you:

  • You have complex data structures that would be difficult to write destructors for (or where STL-style value semantics would be inappropriate, by design) so you need smart pointers to do it for you, and
  • You're using multiple threads that share these objects, and
  • You care about performance as well as correctness

... then actual garbage collection may be a better choice. Although GC has a bad reputation for performance, it's all relative. I believe it compares very favourably with locking smart pointers. It was an important part of why the CLR team chose true GC instead of something using reference counting. See this article, in particular this stark comparison of what reference assignment means if you have counting going on:

no ref-counting:

a = b;

ref counting:

if (a != null)
    if (InterlockedDecrement(ref a.m_ref) == 0)
            a.FinalRelease();

if (b != null)
    InterlockedIncrement(ref b.m_ref);

a = b;
Daniel Earwicker
  • 114,894
  • 38
  • 205
  • 284
  • What happens when another thread refs in between `InterlockedDecrement(...)` and `a.FinalRelease();`? – smokku Jul 24 '16 at 17:59
  • It's impossible in the design - this was generated code, not hand-written. The application code just says `a = b`. If two threads have access to the object sufficient to increment the count, then the count is already >= 2. Either one can decrement it, but in doing so it simultaneously gives up the ability to increment it. – Daniel Earwicker Jul 24 '16 at 21:13
  • What happens when another thread does `a = b` in between `InterlockedDecrement(...)` and `a.FinalRelease();`? – smokku Jul 24 '16 at 23:47
0

That particular code posted in that ddj article is adding extra complexity to account for bugs in using smart pointers.

Specifically, if you can't guarantee that the smart pointer won't change in an assignment to another smart pointer, you are doing it wrong or are doing something very unreliable to begin with. If the smart pointer can change while being assigned to another smart pointer, that means that the code doing the assignment doesn't own the smart pointer, which is suspect to begin with.

MSN
  • 53,214
  • 7
  • 75
  • 105
0

If the instruction itself is not atomic then you need to make the section of code that updates the appropriate variable a critical section.

i.e. You need to prevent other threads entering that section of code by using some locking scheme. Of course the locks need to be atomic, but you can find an atomic locking mechanism within the pthread_mutex class.

The question of efficient: The pthread library is as efficient as it can be and still guarantee that mutex lock is atomic for your OS.

Is it expensive: Probably. But for everything that requires a guarantee there is a cost.

Martin York
  • 257,169
  • 86
  • 333
  • 562