0

I don't know if this is good practice or not but I am doing work on a real time stream of input data and using pthreads in lockstep order to allow one thread at a time to do different operations at the same. This is my program flow for each thread:

void * my_thread() {
    pthread_mutex_lock(&read_mutex);

    /*
    read data from a stream such as stdin into global buffer
    */

    pthread_mutex_lock(&operation_mutex);
    pthread_mutex_unlock(&read_mutex);

    /*
     perform some work on the data you read 
    */

   pthread_mutex_lock(&output_mutex);
   pthread_mutex_unlock(&operation_mutex);

   /*
    Write the data to output such as stdout
    */
   pthread_mutex_unlock(&output_mutex);
}

I know there is pthread conditional lock, but is my approach a good idea or a bad idea? I tested this on various size streams and I am trying to think of corner cases to make this deadlock, produce race condition, or both. I know mutexes don't guarantee thread order execution but I need help to think of scenarios that will break this.

UPDATE:

I stepped away from this, but had sometime recently to rethink about this. I rewrote the code using C++ threads and mutexes. I am trying to use condition variables but have no such luck. This is my approach to the problem:

void my_thread_v2() {
    //Let only 1 thread read in at a time
    std::unique_lock<std::mutex> stdin_lock(stdin_mutex);
    stdin_cond.wait(stdin_lock);

    /*
    Read from stdin stream
    */

    //Unlock the stdin mutex
    stdin_lock.unlock();
    stdin_cond.notify_one();

    //Lock step
    std::unique_lock<std::mutex> operation_lock(operation_mutex);
    operation_cond.wait(operation_lock);

    /*
     Perform work on the data that you read in
     */

    operation_lock.unlock();
    operation_cond.notify_one();

    std::unique_lock<std::mutex> stdout_lock(stdout_mutex);
    stdout_cond.wait(stdout_lock);

    /*
     Write the data out to stdout
     */

    //Unlock the stdout mutex
    stdout_lock.unlock();
    stdout_cond.notify_one();
}

I know the issue with this code is that there is no way to signal the first condition. I definitely am not understanding the proper use of the condition variable. I looked at various examples on cpp references, but can't seem to get away from the thought that the initial approach maybe the only way of doing what I want to do which is to lock step the threads. Can someone shed some light on this?

UPDATE 2:

So I implemented a simple Monitor class that utilizes C++ condition_variable and unique_lock:

class ThreadMonitor{
public:
    ThreadMonitor() : is_occupied(false) {}
    void Wait() {
        std::unique_lock<std::mutex> lock(mx);
        while(is_occupied) {
            cond.wait(lock);
        }
        is_occupied = true;
    }

    void Notify() {
        std::unique_lock<std::mutex> lock(mx);
        is_occupied = false;
        cond.notify_one();
    }

private:
    bool is_occupied;
    std::condition_variable cond;
    std::mutex mx;
};

This is my initial approach assuming i have three ThreadMonitors called stdin_mon, operation_mon, and stdout_mon:

void my_thread_v3() {
    //Let only 1 thread read in at a time
    stdin_mon.Wait();

    /*
    Read from stdin stream
    */

    stdin_mon.Notify();

    operation_mon.Wait();

    /*
     Perform work on the data that you read in
     */

    operation_mon.Notify();

    stdout_mon.Wait();
    /*
     Write the data out to stdout
     */

    //Unlock the stdout
    stdout_mon.notify();
}

The issue with this was that the data was still being corrupted so I had to change back to the original logic of lock stepping the threads:

void my_thread_v4() {
    //Let only 1 thread read in at a time
    stdin_mon.Wait();

    /*
    Read from stdin stream
    */

    operation_mon.Wait();
    stdin_mon.Notify();

    /*
     Perform work on the data that you read in
     */

    stdout_mon.Wait();
    operation_mon.Notify();

    /*
     Write the data out to stdout
     */

    //Unlock the stdout
    stdout_mon.notify();
}

I am beginning to suspect that if thread order matters that this is the only way to handle it. I am also questioning what the benefit is of using a Monitor that utilizes condition_variable over just using a mutex.

mjl007
  • 735
  • 2
  • 11
  • 27
  • ahh got it. is there a recommended approach to order thread execution via pthreads? – mjl007 Aug 06 '18 at 09:36
  • C or C++? Solution might look different depending on language (in C++, you might prefer standard's synchronisation facilities: `std::mutex`, `std::shared_mutex`, `std::lock_guard`, `std::unique_lock`, `std::shared_lock`, ...)! – Aconcagua Aug 06 '18 at 09:57
  • preferably C++ but I would like to know for C as well. – mjl007 Aug 06 '18 at 10:10
  • 1
    @nos i dont think its possible for thread 2 to run through the whole function while thread 1 is still in there and holding a lock. because in the function the next lock is aquired before the previous is released – phön Aug 06 '18 at 10:25
  • @nos there are 3 different locks. and the unlocks are called after the next lock is already aquired. son once you have the first lock you will basically lock the whole function – phön Aug 06 '18 at 10:48
  • @phön Yes, you are right – nos Aug 06 '18 at 10:49
  • @phön Only partially correct: As soon as first thread has released the first mutex, second thread can run up to acquiring second mutex *simultaneously*, so partial overlaps are possible (but not second thread "overtaking" first one). – Aconcagua Aug 06 '18 at 11:17
  • @Aconcagua Yes. My comment was in respect to "overtaking" so my wording is not thaaaat accurate, but you are right. – phön Aug 06 '18 at 11:22
  • Using a per-thread buffer would allow skipping the operation_mutex. – Michael Surette Aug 06 '18 at 23:34
  • @Aconcagua In your scenario how can the second thread run up and grab the second mutex as soon as the first thread releases the first mutex? Even within a context switch the first thread will always lock the next section before unlocking the previous section. – mjl007 Aug 07 '18 at 02:24
  • @mjl007 Second thread can run up to the point where it is *about to* grab, but cannot do so until first thread releases the second mutex as well (thus no "overtaking"). – Aconcagua Aug 07 '18 at 08:02
  • If the problem is that data is being corrupted, why not just protect the shared data with a mutex? What's the benefit of all the additional complexity? (Also, your code using condition variables is all wrong. You can't just call `Wait` unconditionally.) – David Schwartz Nov 09 '18 at 18:17

1 Answers1

0

The problem with your approach is that you still can modify the data while another thread is reading it:

  1. Thread A acquired read, then operation and released read again, and starts writing some data, but is interrupted.
  2. Now thread B operates, acquires read and can read the partially modified, possibly inconsistent data!

I assume you want to allow multiple threads reading the same data without blocking, but as soon as writing, the data shall be protected. Finally, while outputting data, we are just reading the modified data again and thus can do this concurrently again, but need to prevent simultaneous write.

Instead of having multiple mutex instances, you can do this better with a read/write mutex:

  1. Any function only reading the data acquires the read lock.
  2. Any function intending to write acquires write lock right from the start (be aware that first acquiring read, then write lock without releasing the read lock in between can result in dead-lock; if you release read lock in between, though, your data handling needs to be robust against data being modified by another thread in between as well!).
  3. Reducing write lock to shared without releasing in between is safe, so we can do so now before outputting. If data must not be modified in between writing data and outputting it, we even need to do this without entirely releasing the lock.

Last point is problematic as not supported neither by C++ standard's thread support library nor by pthreads library.

For C++ boost provides a solution; if you don't want to or cannot (C!) use boost a simple, but possibly not most efficient approach would be protecting acquiring write lock via another mutex:

  1. acquire standard (non-rw) mutex protecting the read write mutex
  2. acquire RW mutex for writing
  3. release protecting mutex
  4. read data, write modified data
  5. acquire protecting mutex
  6. release RW mutex
  7. re-acquire RW mutex for reading; it does not matter if another thread acquired for reading as well, we only need to protect against locking for write here
  8. release protecting mutex
  9. output
  10. release RW mutex (no need to protect)...

Non-modifying functions can just acquire the read lock without any further protection, there aren't any conflicts with...

In C++, you'd prefer using the thread support library and additionally gain platform independent code for free, in C, you would use a standard pthread mutex for protecting acquiring the write lock just as you did before and use the RW variants from pthread for the read write lock.

Aconcagua
  • 24,880
  • 4
  • 34
  • 59
  • Sorry if I wasn't clear. I am working off of a data stream such as stdin. I do want multiple threads to read the data, but since the data size from the stream is indeterminate and the order matters I am not sure if your approach will work in my case. But I may not be understanding your approach as well. – mjl007 Aug 06 '18 at 11:58
  • @mjl007 Hm changes the situation quite a lot. Problem of one thread overwriting data of another thread still remains if you use a global buffer; read the data into some thread local data buffer and you'll be fine. With given design, maximally three threads can operate simultaneously. If data calculation (operation section) is *much more complex* than reading/writing from streams, you might only protect reading and writing from/to the streams (so that more threads can operate at once), and manage the correct output sequence by other means... – Aconcagua Aug 06 '18 at 13:53