Access pthread shared std:map without data race

Question

My scenario is to have a main thread and tens of worker threads. Worker threads will process incoming messages from different ports.

What I want to do is to have main and worker threads share a same map, the worker threads save data into map (in different bucket). And the main thread grep the map content periodically.

The code goes like:

struct cStruct
{
    std::map<string::string> map1; 
    pthread_mutex_t mutex1;
    pthread_mutex_t mutex2;  
};

int main(){

    struct cStruct cStruct1;   
    while (condition){
        pthread_t th;
        int th_rc=pthread_create(&th,NULL,&task,(void *) &cStruct1);
    }

}

void* task(void* arg){
    struct cStruct cs = * (struct cStruct*) arg;

    while (coming data){
        if (main_thread_work){
            pthread_cond_wait(&count_cond, &cs.mutex1)
        }  

        pthread_mutex_lock(&cs.mutex1);
        // add a new bucket to the map
        cs.map1(thread_identifier)=processed_value;
        pthread_mutex_unlock(&cs.mutex1);

    }

void* main_thread_task(void* arg){

    sleep(sleep_time);
    main_thread_work = true;
    pthread_mutex_lock(&cs.mutex1);
    // main_thread reads the std::map
    main_thread_work = false;
    pthread_cond_broadcast(&count_cond, &cs.mutex1)
    pthread_mutex_unlock(&cs.mutex1);    
}

My questions are:

For map size change, I should use lock to protect the map. But for map with certain key update, can I let different threads modify the map concurrently? (assume no two identical buckets of map will be accessed at same time)

For the main thread greps the map, I thought of use conditional wait to hold all the worker threads while main thread is grepping the map content, and do a pthread_cond_broadcast to wake then up. The problem is that if a worker thread is updating map while main starts to work, there will be data race.

Please share some ideas to help me improve my design.

Edit 1: Add main_thread_task(). The thing I want to avoid is worker thread arriving pthread_cond_wait "after" pthread_cond_broadcast and the logic goes wrong.

So I false the main_thread_work before main thread broadcasts workers thread.

Having a mutex inside a struct to protect it is wrong (you need a mutex around the struct). — , Sep 07 '15 at 18:50
Do you mean that I should not contain the lock inside the struct? (I should declare a global lock instead?) — tester, Sep 07 '15 at 18:54
But for pthread_create I could only send in one parameters, is there any way other than using struct or global variables? — tester, Sep 07 '15 at 19:01
@tester Don't use thread creation to tell threads what to do. Tell the thread what to do after you've created it. It's much easier that way, and you can pass whatever you like to the thread. If you insist on doing it at thread creation, dynamically allocate a structure that holds everything the thread needs, pass the thread a pointer to that structure, and let the thread delete the structure when it's done with it. — David Schwartz, Sep 07 '15 at 19:46
@DavidSchwartz Thread create function I have used such as pthread_create and std::thread provide task function when created. Could you share an example on "Tell the thread what to do after created it" ? For the structure, since I need all my worker threads modify same "map", I chose to pass just single structure with same map to all the threads. — tester, Sep 07 '15 at 23:33
@tester Use a [waitable queue of std::function objects](http://stackoverflow.com/a/29742586/721269) or something similar. — David Schwartz, Sep 08 '15 at 00:46
@MikeMB is there any benefit to use std::thread instead of pthread in this case? — tester, Sep 08 '15 at 07:25
@tester: It is portable, has better syntax and can accept function objects. To be honest, I think there are a lot of problems with your code and I was thinking about writing an answer, but as I usually don't use POSIX primitives, I was wondering if there is a special reason for the p-threads, or if I could just use std::thread, std::mutex, std::lock_guard, std::atomic and all the other nice tools the standard library provides in my answer. — MikeMB, Sep 08 '15 at 07:38
@MikeMB it would be appreciated if you could provide an example with std::thread when you have time. I am new to thread implementation and pthread is just something I tried first. — tester, Sep 08 '15 at 07:46

Alex Lop. · Answer 1 · 2015-09-07T19:16:35.213

0

You should use mutex locking mechanism on each access to the map (in your case) and not only on adding a new 'bucket'. In case T1 tries to write some value to the map while T2 inserts a new bucket, the pointer/iterator which is used by T1 becomes invalid.

Regarding the pthread_cond_wait. It may do the job in case the only thing that the other threads do is just modifying the map. If they perform other calculations or process some non shared data, it is better to use the same mutex just to protect access to the map and let other threads do their job which may be at that point not related to the shared map.

edited Sep 07 '15 at 19:16

answered Sep 07 '15 at 19:09

Alex Lop.

6,810
1
26
45

@tester I have just edited my answer regarding the `pthread_cond_wait`. – Alex Lop. Sep 07 '15 at 19:17
Thanks. I added mutex protection in main_thread as your suggestion. The common use example of pthred_cond_wait usually goes with pthred_mutex_lock before and pthred_mutex_unlock after it. But I did not use the two locks since I need worker threads all wait and all waked up when main is broadcasting. Is there any side effect with this implementation? – tester Sep 08 '15 at 01:12
@tester I looked at it once again now and I think this time I understood what you meant. So yes, you need to wrap `pthread_cond_wait(&count_cond, &cs.mutex1)` with its condition check of `main_thread_work`, otherwise you get a race on it (`main_thread_work`) since the main thread can update it whenever it wishes so that the worker thread(s) will get an outdated value. – Alex Lop. Sep 08 '15 at 06:01

David Schwartz · Accepted Answer · 2015-09-08T05:12:23.933

0

while (coming data){
    if (main_thread_work){
        pthread_cond_wait(&count_cond, &cs.mutex1)
    }  

    pthread_mutex_lock(&cs.mutex1);

This clearly can't be right. You can't check main_thread_work unless you hold the lock that protects it. How can the call to pthread_cond_wait release a lock it doesn't hold?!

This should be something like:

void* task(void* arg){
    struct cStruct cs = * (struct cStruct*) arg;

    // Acquire the lock so we can check shared state
    pthread_mutex_lock(&cs.mutex1);

    // Wait until the shared state is what we need it to be
    while (main_thread_work)
        pthread_cond_wait(&count_cond, &cs.mutex1)

    // Do whatever it is we're supposed to do when the
    // shared state is in this state
    cs.map1(thread_identifier)=processed_value;

    // release the lock
    pthread_mutex_unlock(&cs.mutex1);
}

edited Sep 08 '15 at 05:12

answered Sep 08 '15 at 05:04

David Schwartz

179,497
17
214
278

Another thread runs main_thread_task() control the main_thread_work and hold the mutex. I saw must of the pthread_cond_wait() example use lock to protect it. But for me I need to have "multiple" worker threads to stop here, until thread runs main_thread_task() do broadcasting. Not sure this would cause problems. Please share your suggestion on this.. Thank you! – tester Sep 08 '15 at 05:10
@tester Okay. So what? (Do you understand that `pthread_cond_wait` is an unconditional, atomic "unlock and wait" operation?) – David Schwartz Sep 08 '15 at 05:10
I am new to pthread but "unlock and wait" is really good point to me! So pthread_cond_wait protects the region between pthread_mutex_lock and pthread_cond_wait? Another question is: if I have nothing to do between pthread_mutex_lock and pthread_cond_wait, is it necessary to add pthread_mutex_lock before do the pthread_cond_wait? – tester Sep 08 '15 at 05:19
@tester It's necessary, because how else would you know that you needed to wait? The thing you're waiting for could have already happened. The mutex protects the shared state. It's always `lock(); while (state_is_wrong) wait(); do_stuff(); unlock();` – David Schwartz Sep 08 '15 at 05:21
"The thing you're waiting for could have already happened". Does that mean pthred_cond_broadcast happens before pthread_cond_wait? In this case the wait thread would not be waked up. But one thing I want to avoid is that if worker threads runs in very "high frequency", they might lock with each other and slow down. In my main_thread_task function, I swap the flag main_thread_work before pthred_cond_broadcast, and tried to use this flag to protect pthred_cond_wait. – tester Sep 08 '15 at 05:37
@tester To be honest, it sounds like you just don't understand how to use condition variables. You seem to keep inventing strange ways to solve common problems and resisting my advice to do it the simple, normal way that these synchronization primitives were specifically designed for. Are you such an expert that you know better than the folks who designed the standards and functions? Stop trying to solve imagined problems and use condition variables and locks correctly. – David Schwartz Sep 08 '15 at 06:11
I agree with you using the primitive way to protect shared data. The high frequency inter lock between threads would be another problem. – tester Sep 08 '15 at 07:18
@tester What are you talking about? What is "high frequency inter lock"? And what does that have to do with using condition variables *correctly*? – David Schwartz Sep 08 '15 at 07:26
In my implementation I will have many worker threads loop the task function with very high frequency. That's the reason why I thought a lock for each operation might slow down the process, and tried some other ways to solve it. – tester Sep 08 '15 at 07:42
@tester So you thought you might have a problem and then started trying to solve it? Do you think that's how optimization is done? (You're wrong. If there is an issue, it will be from the contention, not from the lock.) – David Schwartz Sep 08 '15 at 07:43
yeah. I think the issue is from the contention on the lock. – tester Sep 08 '15 at 08:07
@tester If you use a lock, the contention will be on the lock. If you use something other than a lock, the contention will be on that. The problem is the contention, not the lock. So if the contention actually turns out to be a problem, you'll need to figure out a way to minimize contention. But long before you even think about that, you need to use a condition variable and a lock correctly many, many times. – David Schwartz Sep 08 '15 at 08:08

Access pthread shared std:map without data race

2 Answers2