The "Treiber Stack" is generally one of the simplest lock-free data structures, and so it is often used when teaching introductions to lock-free algorithms.
I've seen many implementations of Treiber Stacks using C++ atomics. The algorithm itself is trivial, so the real challenge is handling all the other incidental details of lock-free data-structures, such as providing some way of performing safe memory reclamation, avoiding the ABA problem, and allocating nodes in a lock-free manner. This can be solved in various ways, such as using atomic reference counting, hazard pointers, counted/tagged pointers to avoid ABA, and using a lock-free memory pool.
But ignoring all those details and focusing on the simple algorithm itself, one question that occurred to me is the fact that every implementation of Treiber Stacks that I can recall implements the node class using atomic next pointers. For example:
struct Node
{
T value;
std::atomic<Node*> next;
};
But after thinking about the algorithm, I'm not sure why the next pointer needs to be atomic.
The general PUSH algorithm (ignoring lock-free allocation, safe memory reclamation, backoff, ABA avoidance, etc.) is:
Node* n = new Node();
Node* front = m_front.load();
n->next.store(front);
while (!m_front.compare_exchange_weak(front, n))
{
n->next.store(front);
}
The general POP algorithm (again, ignoring all details except the actual algorithmic logic) is:
Node* front = m_front.load();
Node* next = front->next.load();
while (!m_front.compare_exchange_weak(front, next))
{
next = front->next.load();
}
And here is an real-world example implementation of the PUSH algorithm:
https://github.com/khizmax/libcds/blob/master/cds/intrusive/treiber_stack.h#L736
So I don't understand why the next pointer even needs to be atomic. Most C++ implementations use relaxed loads/stores with the next
pointer, so we don't need any memory fences when reading/writing to the next pointer, but my thinking is that it doesn't need to be atomic at all.
From what I can see, at no time is the next pointer of any node concurrently written to. Rather, the next pointer may be concurrently loaded, but I never see any opportunity for the algorithm to concurrently load+store or concurrently store+store. In fact, in the PUSH algorithm, the next pointer is never accessed concurrently at all.
So it seems to me that next pointers are effectively "read-only" when accessed concurrently, so I'm not sure why it would even be necessary to make them atomic at all.
Yet, every C++ implementation of a Treiber Stack I've seen makes the next pointers to be atomic. So am I correct, or is there some reason the next pointer must be made atomic?