I'm working on a lock-free version of the "Simple Segregated Storage" memory pool in C++.
The SSS memory pool is similar to a slab allocator : it's basically just a chunk of memory that is divided into blocks of equal sizes, and we have a free list pointer pointing to the first available block. Allocating is simply moving the pointer up to the next block, and deallocating is simply setting the free list pointer to the deallocated block, and pointing the "next" pointer on the deallocated block to the old value of the freelist pointer.
So it's basically a singly-linked list.
Now, I'm trying to code a lock-free version of the Simple Segregated Storage algorithm. If we assume that SEGREGATING the initial block of memory (i.e. creating the linked list) is always done before entering a multi-threaded environment, we only need to worry about allocating and deallocating blocks - in which case this problem becomes very similar to a lock-free singly linked-list, which is a well understood problem.
So, it seems to me that allocating and deallocating could easily be done in a lock-free manner by using simple compare and swap instructions.
Assuming we have the following free list pointer:
std::atomic<unsigned char*> m_first
And assume we have a helper function, nextof()
, which takes in a block pointer and returns a reference to the "next pointer". The next pointer is actually embedded within the memory block - so the nextof
function looks like this:
unsigned char*& nextof(void* ptr)
{
return *static_cast<unsigned char**>(ptr);
}
That's at least how the Boost library implements Simple Segregated Storage.
Anyway, my attempt at a lock-free, atomic allocate
function is:
void* malloc()
{
unsigned char* first = m_first.load();
if (!first) return nullptr;
while (!m_first.compare_exchange_strong(first, nextof(first)))
{
if (!first) break;
}
return first;
}
So - this seems very straightforward. All we need to do is atomically set m_first
to the next pointer embedded within the block pointed to by m_first
. Initially, I thought this would be as simple as saying m_first.store(nextof(m_first))
- but unfortunately I don't think that's atomic because we're actually doing a store and a load here on the same memory location - we're storing a new value into m_first
, but also loading the value of m_first
in order to get it's next pointer.
So, it seems we need to do a Compare and Swap. In the above code, I atomically load the value of m_first
into a local variable. Then, after checking for null, I atomically change the value of m_first
with a CAS, if and only if it still equals the value of the local variable. If it doesn't, the local variable is updated to whatever m_first
is now, and we keep looping until we're not preempted by some other thread.
The deallocation is a bit trickier - here we need to change the value of m_first
to the deallocated block, and also update the next pointer on the deallocated block to point to what m_first
used to be.
My attempt at this is:
void free(void* chunk)
{
unsigned char* first = m_first.load();
nextof(chunk) = first;
while (!m_first.compare_exchange_strong(first, static_cast<unsigned char*>(chunk)))
{
nextof(chunk) = first;
}
}
I'm not quite sure I have this right, but I think it's correct. Firstly, I atomically load the value of m_first
and store it in a local variable. I then set the next pointer on the block-to-be-free'd to my local copy of m_first
. Of course, by now, another thread could have intervened and set m_first
to something else - so I have to now do a CAS operation to check that m_first
is still what I expect. If it's not, I have to set the next pointer to the appropriate value, and continue looping.
As far as I can see this is all race-condition safe, with a single CAS operation for malloc
and free
.
Question: Since lock-free algorithms are hard, I'm not certain I have this correct. Am I overlooking something here that could lead to a race condition?