Deadlock occuring after multiple iterations of queries in threads (multithreading)

Question

I encounter deadlocks while executing the code snippet below as a thread.

void thread_lifecycle(
Queue<std::tuple<int64_t, int64_t, uint8_t>, QUEUE_SIZE>& query, 
Queue<std::string, QUEUE_SIZE>& output_queue,
std::vector<Object>& pgs,
bool* pgs_executed, // Initialized to array false-values
std::mutex& pgs_executed_mutex,
std::atomic<uint32_t>& atomic_pgs_finished
){
    bool iter_bool = false;
    std::tuple<int64_t, int64_t, uint8_t> next_query;
    std::string output = "";
    int64_t lower, upper;

    while(true) {
        // Get next query
        next_query = query.pop_front();

        // Stop Condition reached terminate thread
        if (std::get<2>(next_query) == uint8_t(-1)) break;

        //Set query params
        lower = std::get<0>(next_query);
        upper = std::get<1>(next_query);


        // Scan bool array
        for (uint32_t i = 0; i < pgs.size(); i++){
            // first lock for reading
            pgs_executed_mutex.lock();
            if (pgs_executed[i] == iter_bool) {
                pgs_executed[i] = !pgs_executed[i];
                // Unlock and execute the query
                pgs_executed_mutex.unlock();
                output = pgs.at(i).get_result(lower,  upper);

                // If query yielded a result, then add it to the output
                if (output.length() != 0) {
                    output_queue.push_back(output);
                }

                // Inform main thread in case of last result
                if (++atomic_pgs_finished >= pgs.size()) {
                    output_queue.push_back("LAST_RESULT_IDENTIFIER");
                    atomic_pgs_finished.exchange(0);
                }
            } else {
                pgs_executed_mutex.unlock();
                continue;
            }
        }
        //finally flip for next query
        iter_bool = !iter_bool;
    }
}

Explained:

I have a vector of objects containing information which can be queried (similar to as a table in a database). Each thread can access the objects and all of them iterate the vector ONCE to query the objects which have not been queried and return results, if any.

In the next query it goes through the vector again, and so on... I use the bool* array to denote the entries which are currently queried, so that the processes can synchronize and determine which query should be executed next.

If all have been executed, the last thread having possibly the last results will also return an identifier for the main thread in order to inform that all objects have been queried.

My Question:

Regarding the bool* as well as atomic-pgs_finished, can there be a scenario, in-which a deadlock can occur. As far as i can think, i cannot see a deadlock in this snippet. However, executing this and running this for a while results into a deadlock.

I am seriously considering that a bit (byte?) has randomly flipped causing this deadlock (on ECC-RAM), so that 1 or more objects actually were not executed. Is this even possible?

Maybe another implementation could help?

Edit, Implementation of the Queue:

template<class T, size_t MaxQueueSize>
class Queue
{
    std::condition_variable consumer_, producer_;
    std::mutex mutex_;
    using unique_lock = std::unique_lock<std::mutex>;

    std::queue<T> queue_;

public:
    template<class U>
    void push_back(U&& item) {
        unique_lock lock(mutex_);
        while(MaxQueueSize == queue_.size())
            producer_.wait(lock);
        queue_.push(std::forward<U>(item));
        consumer_.notify_one();
    }

    T pop_front() {
        unique_lock lock(mutex_);
        while(queue_.empty())
            consumer_.wait(lock);
        auto full = MaxQueueSize == queue_.size();
        auto item = queue_.front();
        queue_.pop();
        if(full)
            producer_.notify_all();
        return item;
    }
};

I never lock mutexes directly but use std::unique_lock. I don't know if it will matter here. But for instance, could pgs.size() decrease? If so, you're accessing it without a lock on it, but then you're using the index, and that could throw an exception. Now your mutex is locked. — Joseph Larson, Mar 15 '22 at 13:42
No pgs.size() remains constant over the lifetime of the threads. I'll try std::unique_lock and will report back! — Luxii, Mar 15 '22 at 13:45
`next_query = query.pop_front();` -- Will multiple threads call your function? If so, this is unprotected. — PaulMcKenzie, Mar 15 '22 at 13:51
`bool* pgs_executed, // Initialized to array false-values` -- Where is the guarantee that this pointer points to somewhere valid, in addition to pointing somewhere with at least `pgs.size` elements? Why not pass a `vector` or if not that, a `std::deque` if `vector` is troublesome? — PaulMcKenzie, Mar 15 '22 at 13:57
@PaulMcKenzie Sorry! I forgot to add the implementation of the Queue! This should be threadsafe! — Luxii, Mar 15 '22 at 13:58
@PaulMcKenzie My first thought while using vector was that a random flip occurs, so i changed the implementation to bool*. In both cases a deadlock occurs — Luxii, Mar 15 '22 at 13:59
`bool iter_bool = false; ... iter_bool = !iter_bool;` -- I'm wary of code that uses non-atomic variables as flags within a multithreaded scenario. — PaulMcKenzie, Mar 15 '22 at 14:02
What's the pgs_executed and iter_bool anyway? Is that supposed to be a mask what objects the query should apply to? — Goswin von Brederlow, Mar 15 '22 at 14:08
can you build with a thread sanitizer, which can tell more about the deadlock, other than an observation? see [this](https://stackoverflow.com/questions/17517824) question for help — quidstone, Mar 15 '22 at 14:28
@JosephLarson it did not help using std::unique_lock. It still occurs — Luxii, Mar 15 '22 at 14:42
@PaulMcKenzie i have set the iter_bool as a atomic variable, however, the deadlock still happens. But this variable is per thread so it would be new to me that this needs to be done. — Luxii, Mar 15 '22 at 14:43
@quidstone Thank you for the suggestion! I did not knew gcc has such an option. I am currently running it. But it will probably take many hours until it gets to the point were it usually deadlocks (~600-900 queries), since the code now runs very slow — Luxii, Mar 15 '22 at 14:46
It helps to document which mutex protects which data structure and also which event has which meaning. Only with that documented, you can say whether a piece of code contradicts that plan or not. Also, please extract and provide a [mcve]. There's too much guessing on here at the moment. — Ulrich Eckhardt, Mar 15 '22 at 14:47
@GoswinvonBrederlow These variables are needed to tell the other threads on which object it should NOT execute the query. It makes no sense to parallelize the queries on the objects if every object is execute "threadmany" times. It is enough to call it once on each object -> hence the pgs_executed bool array, telling the other threads where it already was executed. The iter-bool is simply for performance-reasons. Simply flip it and you can reuse the bool array for the next query — Luxii, Mar 15 '22 at 14:49
Try making your queue push and pop methods just use a single condition variable rather than separate ones. That might have absolutely nothing to do with it, but it looked a little fishy. — Joseph Larson, Mar 15 '22 at 14:51
@Luxii Unless the query takes a long time per object that is a horrible design. 512 objects each will share a cacheline and access to the bit will be horrible slow. It would be much better to split the objects into groups and give each thread a group. — Goswin von Brederlow, Mar 15 '22 at 17:35
@GoswinvonBrederlow Sadly, i cannot pre-determine how long a query could take on an object. The runtime for an object differs on different queries. Sometimes an object can be fast, other times slow. To accommodate for this, i let each Thread pick an object, to execute the query. In this case the slowest running query on a object sets the overall query-time, while in the meantime the others objects can return results. I agree that grouping/binning the objects for threads would be better, but this can lead to a group where more than one objects takes a long time, increasing the overall runtime — Luxii, Mar 16 '22 at 07:51
You could use a work-stealing-queue. So when one thread gets stuck on a long query the blocked object get redistributed. Another method I found useful in the past was to make groups of decreasing size. So at the start each thread gets a big chunk of objects. Whoever finishes first gets a smaller chunk next. Towards the end threads might get single objects. If a thread hits an expensive object at the start that is ok because there is plenty of stuff in the queue for the other threads. And at the end an expensive object won't block many (or no) other objects. — Goswin von Brederlow, Mar 16 '22 at 11:18

score 0 · Answer 1 · answered Mar 15 '22 at 15:42

Thanks to @Ulrich Eckhardt (, @PaulMcKenzie and all the other comments, thank you for the brainstorming!). I probably have found the cause of the deadlock. I tried to reduce this example even more and thought on removing atomic_pgs_finished, a variable indicating whether all pgs have been queried. Interestingly: ++atomic_pgs_finished >= pgs.size() returns not only once but multiple times true, so that multiple threads are in this specific if-clause.

I simply fixed it by using another mutex around this if-clause. Maybe someone can explain why ++atomic_pgs_finished >= pgs.size() is not atomic and causes true for multiple threads.

Below i have updated the code (mostly the same as in the question) with comments, so that it might be more understandable.

void thread_lifecycle(
Queue<std::tuple<int64_t, int64_t, uint8_t>, QUEUE_SIZE>& query, // The input queue containing queries, in my case triples
Queue<std::string, QUEUE_SIZE>& output_queue, // The Output Queue of results
std::vector<Object>& pgs, // Objects which should be queried
bool* pgs_executed, // Initialized to an array of false-values
std::mutex& pgs_executed_mutex, // a mutex, protecting pgs_executed
std::atomic<uint32_t>& atomic_pgs_finished // atomic counter to count how many have been executed (to send a end signal)
){
    // Initialize variables
    std::tuple<int64_t, int64_t, uint8_t> next_query;
    std::string output = "";
    int64_t lower, upper;

    // Set the first iteration to false for the very first query
    // This flips on the second iteration to reuse pgs_executed with true values and so on...
    bool iter_bool = false;

    // Execute as long as valid queries are received
    while(true) {
        // Get next query
        next_query = query.pop_front();

        // Stop Condition reached terminate thread
        if (std::get<2>(next_query) == uint8_t(-1)) break;

        // "Parse query" to query the objects in pgs
        lower = std::get<0>(next_query);
        upper = std::get<1>(next_query);


        // Now iterate through the pgs and pgs_executed (once)
        for (uint32_t i = 0; i < pgs.size(); i++){
            // Lock to read and write into pgs_executed
            pgs_executed_mutex.lock();
            if (pgs_executed[i] == iter_bool) {
                pgs_executed[i] = !pgs_executed[i];
                // Unlock since we now execute the query on the object (which was not queried before)
                pgs_executed_mutex.unlock();

                // Query Execution
                output = pgs.at(i).get_result(lower,  upper);

                // If the query yielded a result, then add it to the output for the main thread to read
                if (output.length() != 0) {
                    output_queue.push_back(output);
                }

                // HERE THE ROOT CAUSE OF THE DEADLOCK HAPPENS
                // Here i would like to inform the main thread that we exexuted the query on 
                // every object in pgs, so that it should no longer wait for other results
                if (++atomic_pgs_finished >= pgs.size()) {
                    // In this if-clause multiple threads are present at once!
                    // This is not intended and causes a deadlock,  push_back-ing
                    // multiple times "LAST_RESULT_IDENTIFIER" in-which the main-thread 
                    // assumed that a query has finished. The main thread then simply added the next query, while the 
                    // previous one was not finished causing threads to race each other on two queries simultaneously
                    // and not having the same iter_bool!
                    output_queue.push_back("LAST_RESULT_IDENTIFIER");
                    atomic_pgs_finished.exchange(0);
                }
                // END: HERE THE ROOT CAUSE OF THE DEADLOCK HAPPENS
            } else {
                // This case happens when the next element in the list was already executed (by another process), 
                // simply unlock pgs_executed and continue with the next element in pgs
                pgs_executed_mutex.unlock();
                continue; // This is uneccessary and could be removed
            }
        }
        //finally flip for the next query in order to reuse bool* (which now has trues if a second query is incoming)
        iter_bool = !iter_bool;
    }
}

What does the asm code for `(++atomic_pgs_finished >= pgs.size())` look like? — Goswin von Brederlow, Mar 15 '22 at 17:39
The asm code can be found [here](https://godbolt.org/z/E758bsfsP). However, keep in mind that i am compiling this for two Intel(R) Xeon(R) Gold 6130 CPUs. — Luxii, Mar 16 '22 at 07:43

Deadlock occuring after multiple iterations of queries in threads (multithreading)

1 Answers1