I'm trying to wrap my head around LFRC algorithms but I keep finding examples where I assume that they work, but I can't understand why. I'm focused on copying and deletion. There are three in particular that all follow the same general pattern of (pseudo-code):
Copy {
shared = source->shared;
if (shared != null) {
Atomically_Increment_Counter(shared);
}
}
Delete {
if (shared != null) {
oldcounter = Atomically_Decrement_Counter(shared);
if (oldcounter == value signifying all references released) {
Deallocate(shared);
}
}
}
First is from the 2004 paper Lock-Free Reference Counting by David L. Detlefs, figure 2, page 8 (edited for formatting):
void LFRCCopy(SNode **v, SNode *w) {
Snode *oldv = *v;
if (w != null)
add_to_rc(w,1);
*v = w;
LFRCDestroy(oldv);
}
void LFRCDestroy(SNode *v) {
if (v != null && add_to_rc(v, -1)==1) {
LFRCDestroy(v->L);
LFRCDestroy(v->R);
delete v;
}
}
Where add_to_rc
is an atomic increment and load. The second is from the Mat
implementation in opencv4 (edited for brevity):
// Line 405
Mat::Mat(const Mat& m)
: flags(m.flags), dims(m.dims), rows(m.rows), cols(m.cols), data(m.data),
datastart(m.datastart), dataend(m.dataend), datalimit(m.datalimit), allocator(m.allocator),
u(m.u), size(&rows), step(0)
{
if( u )
CV_XADD(&u->refcount, 1);
...
}
// Line 551
void Mat::release()
{
if( u && CV_XADD(&u->refcount, -1) == 1 )
deallocate();
u = NULL;
...
}
Where CV_XADD
is an atomic increment and load. And the third is from the std::shared_ptr
implementation that ships with libstdc++ in the most recent version of GCC (10.2) (this implementation is very complex and I've edited it down a lot for brevity):
class _Sp_counted_base {
void _M_add_ref_copy()
{ __gnu_cxx::__atomic_add_dispatch(&_M_use_count, 1); }
// Line 161 for full implementation
void _M_release() noexcept
{
_GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&_M_use_count);
if (__gnu_cxx::__exchange_and_add_dispatch(&_M_use_count, -1) == 1)
{
_GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&_M_use_count);
_M_dispose();
if (_Mutex_base<_Lp>::_S_need_barriers)
__atomic_thread_fence (__ATOMIC_ACQ_REL);
// editors note: weak pointer handling removed for brevity
_M_destroy();
}
}
}
class __shared_count {
__shared_count(const __shared_count& __r) noexcept
: _M_pi(__r._M_pi)
{
if (_M_pi != nullptr)
_M_pi->_M_add_ref_copy();
}
~__shared_count() noexcept
{
if (_M_pi != nullptr)
_M_pi->_M_release();
}
_Sp_counted_base* _M_pi;
}
class __shared_ptr {
__shared_ptr(const __shared_ptr&) noexcept = default;
~__shared_ptr() = default;
element_type* _M_ptr; // Contained pointer.
__shared_count _M_refcount; // Reference counter.
}
Some explanation of that last one since it's kind of indirect:
- Copy:
__shared_ptr
default copy constructor copies_M_refcount
, whose copy constructor shares the same_Sp_counted_base
as the original, and atomically increments the reference counter. - Delete:
__shared_ptr
default destructor destroys_M_refcount
, whose destructor calls_M_release
on the_Sp_counted_base
, which atomically decrements and possibly deallocates the reference counter.
Anyways, all this boils down to one question which is the crux of my failure to understand why these work (even that old Dr. Dobbs article and the related questions here on SO [I feel like the answer might be here but I can't see it] raised more questions than answers for me):
In the copy operation, what is preventing a race condition where another thread performs the delete operation on the last reference count (possibly via another view on the shared object) then deallocates the object between the start of the copy operation and the start of the atomic counter increment -- thus, from what I (mis)understand, causing copy to increment a deallocated counter and crash or dump core everywhere or something?
That is, referring to opencv implementation above (since examining it is actually what started my whole mini research project here):
Mat::Mat(const Mat& m)
: flags(m.flags), dims(m.dims), rows(m.rows), cols(m.cols), data(m.data),
datastart(m.datastart), dataend(m.dataend), datalimit(m.datalimit), allocator(m.allocator),
u(m.u), // <--- START OF DANGER ZONE
size(&rows), step(0)
{
if( u )
// <--- ALMOST AT END OF DANGER ZONE
CV_XADD(&u->refcount, 1);
...
}
void Mat::release()
{
if( u && CV_XADD(&u->refcount, -1) == 1 )
deallocate();
u = NULL;
...
}
What keeps another thread from releasing the last reference via m
in the marked "danger zone" in the copy constructor, thus leaving u
non-null but pointing to post-deallocation garbage?
What am I missing here? I feel like it might be obvious and maybe I've been awake too long. In particular, the fact that the libstdc++ implementation also seems to follow this pattern gives it a ton of street cred; I just can't understand the details.