Caveat: I don't know C#, but I do know C++ with std::atomic and atomic stuff in x86 asm, and your code (and the function/method names) seem pretty readable/clear so I think I understand what's going on.
You're implementing something similar to C++11's std::shared_ptr
, so you could look at implementations of it for inspiration. (This class is like the back-end control block that separate shared_ptr
objects sharing the same pointer refer to.)
If two threads both run UnregisterShare
and bring the ref count down below zero, they'll both try to .Dispose()
. That's a bug, similar to double-free or double-unlocking. You could check for it, or paper it over by changing the code to ==
instead of <=
so only one thread thread runs .Dispose()
. <=
looks like the worst of both worlds: misbehaviour that could be hard to identify.
in TryRegisterShare
, is the first assignment to prevValue guaranteed to be larger than zero unless all strong references were released?
Barring bugs like failure to call RegisterShare when taking a reference, or double-release bugs, yes I think so. It would be prudent to use if(prevValue <= 0) return false
there, to make sure you bail out on double-release situations where something has left the refcount negative.
A cmpxchg loop doesn't seem ideal, but if you unconditionally increment and just check whether you must have started at zero, that could fool other threads. (e.g. this sequence of events:
- thread A->Unshare
- thread B->TryShare (failure detected, but leaves refcount=1 temporarily)
- thread C->TryShare succeeds
- thread A->Dispose
I haven't looked at how (the Linux / gcc implementation of) C++11 weak_ptr.lock()
implements promotion to a shared_ptr
(and I'm curious now!).
I wonder if they use cmpxchg in a loop (in the asm) or if they do some kind of dance that avoids that. e.g. maybe Unshare, upon detecting refcount==0, could use a cmpxchg loop to modify the refcount from zero to -2^31 before calling Dispose. (And if it detects refcount >= in that loop, it stops trying to kill it.) Then I think TryShare would succeed as long as it saw Interlocked.Increment(ref refCount) >= 1
, since that means that any running UnShare's cmpxchg hasn't already succeeded, and won't succeed.
For some use-cases, it might not be desirable for TryShare to succeed in that case. You could just have it call Unshare to decrement the count again if it it's increment found the old value was zero.
So I think there needs to be a cmpxchg loop somewhere, but if my logic is correct, you can get away with putting it in the refcount-drops-to-zero path of UnShare instead of in the presumably-hotter TryRegisterShare
.
RegisterShare
looks safe if you can absolutely guarantee that the refcount wasn't zero beforehand. This should be the case in a thread which already has a ref. For bug-checking, you could check the return value of the increment to catch cases where you accidentally reanimate a dead object (i.e. on that another thread is about to (or may have already) Disposed.
In C++, if multiple threads were reading and writing the same shared_ptr
, this assumption could be violated. It would require a std::atomic<std::shared_ptr<int>> foo;
for that to be safe, though, and that won't compile because shared_ptr
is not trivially constructible.
So C++ protects itself from unlocked concurrent access to the reference wrapper objects (by declaring it Undefined Behaviour); you should, too. Otherwise another thread with a reference to the same shared_ptr
object this thread is copying might have called .reset()
on it, and so might decrement the refcount in the control block after this thread reads the pointer to the control block, but right before this thread increments the control-block refcount.
I just noticed that the title question is about needing extra fences or volatiles:
The decrement in UnShare needs to become globally visible before anything that .Dispose()
does, and it needs to be a release operation to make sure loads/stores into a shared object become globally visible before the decrement of the refcount.
Your code already achieves that, because Interlocked.anything
has sequential-consistency semantics in .NET on any architecture (not just on Windows), so nothing can reorder across it in either direction.
AlexRP's blog post about the .NET memory model explains lots of detail about this an related things (Interlocked, Thread.VolatileRead
/Write
, Volatile.Read
/Write
, and that simple loads or stores on types up to the width of IntPtr.Size
are automatically atomic (but not synchronized).) Fun fact: In Microsoft's implementation for x86, Thread.VolatileRead
and Write are surrounded on either side by MFENCE instructions, but the language standard only requires acquire or release semantics (which x86 does for free with no barrier). Mono doesn't use barriers for these functions, but MS can't remove them until everyone's buggy code stops relying on that implementation detail.
Other than the decrement, I think everything else just needs atomicity for your ref-counting, not synchronization of any other operations. The ref-counting doesn't synchronize access to the shared object from multiple threads, so a TryShare/UnShare pair doesn't form a critical section (which needs acquire/release semantics). From the constructor to the final UnShare that results in a .Dispose()
, this class is hands-off as far as synchronization. In C++ I think you could use memory_order_relaxed
for the RegisterShare increment and the TryShare cmpxchg. On some weakly-ordered architectures, like ARM, this would require fewer barriers. (That's no an option in C#: you only have the sequential-consistency Interlocked operations which require full barriers on ARM. There's no extra cost on x86, though, since lock
ed read-modify-write instructions on x86 are already full memory barriers like MFENCE (Thread.MemoryBarrier
))