Multithread queue atomic operations

Question

I'm playing with the std::atomic structures and wrote this lock-free multi-producer multi-consumer queue, which I'm attaching here. The idea for the queue is based on two stacks - a producer and consumer stack, which are essentially linked list structures. The nodes of the lists hold the indexes into an array of that holds the actual data, where you would read or write.

The idea would be that the nodes for the lists are mutually exclusive, ie, a pointer to a node can exist only in the producer or the consumer list. A producer would attempt to acquire a node from the producer list, a consumer from the consumer list and whenever a pointer to a node is acquired by either producer or consumer, it should be out of both lists so that noone else could acquire it. I'm using std::atomic_compare_exchange functions to spin until a node is popped.

The problem is that there must be something wrong with the logic or the operations are not atomic as I assume them to be because even with 1 producer and 1 consumer, given enough time, the queue will livelock and what I have noticed is that if you assert that cell != cell->m_next, the assert would get hit ! So its probably something staring me in the face and I just don't see it, so I wonder if someone could pitch in.

Thx

#ifndef MTQueue_h
#define MTQueue_h

#include <atomic>

template<typename Data, uint64_t queueSize>
class MTQueue
{
public:

    MTQueue() : m_produceHead(0), m_consumeHead(0)
    {
        for(int i=0; i<queueSize-1; ++i)
        {
            m_nodes[i].m_idx = i;
            m_nodes[i].m_next = &m_nodes[i+1];
        }
        m_nodes[queueSize-1].m_idx = queueSize - 1;
        m_nodes[queueSize-1].m_next = NULL;

        m_produceHead = m_nodes;
        m_consumeHead = NULL;
    }

    struct CellNode
    {
        uint64_t m_idx;
        CellNode* m_next;
    };

    bool push(const Data& data)
    {
        if(m_produceHead == NULL)
            return false;

        // Pop the producer list.
        CellNode* cell = m_produceHead;
        while(!std::atomic_compare_exchange_strong(&m_produceHead,
                                                   &cell, cell->m_next))
        {
            cell = m_produceHead;
            if(!cell)
                return false;
        }

        // At this point cell should point to a node that is not in any of the lists
        m_data[cell->m_idx] = data;

        // Push that node as the new head of the consumer list
        cell->m_next = m_consumeHead;
        while (!std::atomic_compare_exchange_strong(&m_consumeHead,
                                                    &cell->m_next, cell))
        {
            cell->m_next = m_consumeHead;
        }
        return true;
    }

    bool pop(Data& data)
    {
        if(m_consumeHead == NULL)
            return false;

        // Pop the consumer list
        CellNode* cell = m_consumeHead;
        while(!std::atomic_compare_exchange_strong(&m_consumeHead,
                                                   &cell, cell->m_next))
        {
            cell = m_consumeHead;
            if(!cell)
                return false;
        }

        // At this point cell should point to a node that is not in any of the lists
        data = m_data[cell->m_idx];

        // Push that node as the new head of the producer list
        cell->m_next = m_produceHead;
        while(!std::atomic_compare_exchange_strong(&m_produceHead,
                                                   &cell->m_next, cell))
        {
            cell->m_next = m_produceHead;
        }
        return true;
    };

private:

    Data m_data[queueSize];

    // The nodes for the two lists
    CellNode m_nodes[queueSize];

    volatile std::atomic<CellNode*> m_produceHead;
    volatile std::atomic<CellNode*> m_consumeHead;
};

#endif

This looks like an ABA issue to me: https://stackoverflow.com/questions/25628225/where-is-the-race/25628314#25628314 — dohashi, Sep 18 '14 at 19:16
ABA is pretty hard to test for. I'd suggest really understanding the ABA problem, and then thinking through your algorithm to see if it is vulnerable. Modelling your data structure as a state machine may help as well. Atomic data structures can be very tricky to do properly. — dohashi, Sep 18 '14 at 19:46
I thought ABA would be avoided since the stack CellNode m_next is updated on each fail. Something else I'm noticing is that the atomic compare exchange seems to fail even if the compared variables are the same. So, if you break in the while loop, the variables would be the same, yet, there you are ... I wonder if this could this be an issue with memory barriers ? — bitwise, Sep 18 '14 at 19:55
@bitwise: On an unrelated note; why are you declaring your data as `volatile`? Isn't the memory synchronization guarantees given by `atomic` enough? Do you really have to fall back on slower and more cumbersome semantics of `volatile` as well? Am I missing something? — yzt, Sep 18 '14 at 20:49
Hey yzt,That's just because I was trying out if setting it as volatile would fix the issue. It did not :/ — bitwise, Sep 18 '14 at 20:51
On that note, I'm still thinking if the possibility of a compiler optimizing away local variables could be the core of the issue. It seems very strange that the std::atomic_compare_exchange_strong would fail, yet when you break where it fails, the comparison should have been true. I fail to see how this would be an ABA issue, since the owning of a node should be mutually exclusive, it shouldn't be possible for a thread to change the node thats currently out of both lists... Or maybe somewhere in there hides the problem that I'm not seeing... — bitwise, Sep 18 '14 at 20:53
@bitwise it could be that when the debugger hits a breakpoint, that it synchronizes all the memory, so you don't actually see the memory synchronization issues. — programmerjake, Sep 18 '14 at 21:06
@programmerjake it certainly could. The other clue I have for where the bug is that if you put assert(cell != cell->m_next) right after it has been assigned, the assert does get hit, which is of course disastrous in this case. — bitwise, Sep 18 '14 at 21:27
There's definitely an ABA problem when there's more than one producer and consumer, I think I can see that, but these short comments are ...too short to explain. A bit stumped in seeing how it could happen with just 1 producer and consumer, though. I suspect the issue is that accessing the next pointer is not an atomic operation; and whether the _strong() calls provide a sufficient fence for them to be atomic. — Sam Varshavchik, Sep 19 '14 at 00:19
That's what I'm thinking. The cell->m_next is definitely not atomic and between it's assignment and the while loop there's a chance it could be changed. Thats the case for both before and inside the while loop. Thing is, that's why the while loop is there in the first place. Hmm.... It's a strange little bug, I'll keep digging... — bitwise, Sep 19 '14 at 00:41

score 1 · Answer 1 · answered Sep 20 '14 at 00:58

I believe I was able to crack this one. No livelock at 1000000 writes/reads for queues from size 2 to 1024 and from 1 producer and 1 consumer to 100 producers / 100 consumers.

Here's the solution. The trick is not to use cell->m_next directly in the compare and swap (the same applies for the producer code by the way) and to require strict memory order rules:

This seems to confirm my suspicion that it was compiler reordering of the reads writes. Here's the code:

bool push(const TData& data)
{
    CellNode* cell = m_produceHead.load(std::memory_order_acquire);
    if(cell == NULL)
        return false;

    while(!std::atomic_compare_exchange_strong_explicit(&m_produceHead,
                                                        &cell,
                                                        cell->m_next,
                                                        std::memory_order_acquire,
                                                        std::memory_order_release))
    {
        if(!cell)
            return false;
    }

    m_data[cell->m_idx] = data;

    CellNode* curHead = m_consumeHead;
    cell->m_next = curHead;
    while (!std::atomic_compare_exchange_strong_explicit(&m_consumeHead,
                                                         &curHead,
                                                         cell,
                                                         std::memory_order_acquire,
                                                         std::memory_order_release))
    {
        cell->m_next = curHead;
    }

    return true;
}

bool pop(TData& data)
{
    CellNode* cell = m_consumeHead.load(std::memory_order_acquire);
    if(cell == NULL)
        return false;

    while(!std::atomic_compare_exchange_strong_explicit(&m_consumeHead,
                                                        &cell,
                                                        cell->m_next,
                                                        std::memory_order_acquire,
                                                        std::memory_order_release))
    {
        if(!cell)
            return false;
    }

    data = m_data[cell->m_idx];

    CellNode* curHead = m_produceHead;
    cell->m_next = curHead;
    while(!std::atomic_compare_exchange_strong_explicit(&m_produceHead,
                                                        &curHead,
                                                        cell,
                                                        std::memory_order_acquire,
                                                        std::memory_order_release))
    {
        cell->m_next = curHead;
    }

    return true;
};

Sorry you are posting essentially dangerous/broken code. There are lots of problems with it, eg. fail memory order never can be release. — Yuki, Dec 28 '19 at 23:36

Casey · Accepted Answer · 2014-11-05T00:39:40.280

I see a few problems with your queue implementation:

It's not a queue, it's a stack: the most recent item pushed is the first item popped. Not that there's anything wrong with stacks, but it's confusing to call it a queue. In fact it is two lock-free stacks: one stack that is initially populated with the array of nodes, and another stack that stores actual data elements using the first stack as a list of free nodes.
There is a data race on CellNode::m_next in both push and pop (unsurprisingly, since they both do the same thing, i.e., pop a node from one stack and push that node onto the other). Say two threads simultaneously enter e.g. pop and both read the same value from m_consumeHead. Thread 1 races ahead successfully popping and sets data. Then Thread 1 writes the value of m_produceHead into cell->m_next while Thread 2 is simultaneously reading cell->m_next to pass to std::atomic_compare_exchange_strong_explicit. The simultaneous non-atomic read and write of cell->m_next by two threads is by definition a data race.

This is what is known as a "benign" race in the concurrency literature: a stale/invalid value is read, but never gets used. If you are confident that your code will never need to run on an architecture where it could cause fiery explosions you may ignore it, but for strict conformance with the Standard memory model you need to make m_next an atomic and use at least memory_order_relaxed reads to eliminate the data race.
ABA. The correctness of your compare-exchange loops is based on the premise that an atomic pointer (e.g., m_produceHead and m_consumeHead) having the same value at both the initial load and the later compare-exchange implies that the pointee object must therefore be unchanged as well. This premise does not hold in any design in which it is possible to recycle an object faster than some thread makes a trip through its compare-exchange loop. Consider this sequence of events:
1. Thread 1 enters pop and reads the value of m_consumeHead and m_consumeHead->m_next but blocks before calling the compare-exchange.
2. Thread 2 successfully pops that node from m_consumeHead and blocks as well.
3. Thread 3 pushes several nodes onto m_consumeHead.
4. Thread 2 unblocks and pushes the original node onto m_produceHead.
5. Thread 3 pops that node from m_produceHead, and pushes it back onto m_consumeHead.
6. Thread 1 finally unblocks and calls the compare-exchange function, which succeeds since the value of m_consumeHead is the same. It pops the node - which is all well and good - but sets m_consumeHead to the stale m_next value it read back in step 1. All the nodes pushed by Thread 3 in the meantime are leaked.

Excellent observations ! Thank you, I believe you are correct. If time permits I will revise the design and update this thread. — bitwise, Sep 23 '14 at 01:12

Multithread queue atomic operations

2 Answers2

Linked