7

I'm not sure I got the terminology right but here goes - I have this function that is used by multiple threads to write data (using pseudo code in comments to illustrate what I want)

//these are initiated in the constructor
int* data; 
std::atomic<size_t> size;

void write(int value) {
    //wait here while "read_lock"
    //set "write_lock" to "write_lock" + 1
    auto slot = size.fetch_add(1, std::memory_order_acquire);
    data[slot] = value;
    //set "write_lock" to "write_lock" - 1
}

the order of the writes is not important, all I need here is for each write to go to a unique slot

Every once in a while though, I need one thread to read the data using this function

int* read() {
    //set "read_lock" to true
    //wait here while "write_lock"
    int* ret = data;
    data = new int[capacity];
    size = 0;
    //set "read_lock" to false
    return ret;
}

so it basically swaps out the buffer and returns the old one (I've removed capacity logic to make the snippets shorter)

In theory this should lead to 2 operating scenarios:

1 - just a bunch of threads writing into the container

2 - when some thread executes the read function, all new writers will have to wait, the reader will wait until all existing writes are finished, it will then do the read logic and scenario 1 can continue.

The question part is that I don't know what kind of a barrier to use for the locks -

A spinlock would be wasteful since there are many containers like this and they all need cpu cycles

I don't know how to apply std::mutex since I only want the write function to be in a critical section if the read function is triggered. Wrapping the whole write function in a mutex would cause unnecessary slowdown for operating scenario 1.

So what would be the optimal solution here?

user81993
  • 6,167
  • 6
  • 32
  • 64
  • I guess this question might be helpful https://stackoverflow.com/questions/14306797/c11-equivalent-to-boost-shared-mutex – Jan Korous Jun 22 '16 at 23:44
  • So you don't need to synchronize the writer threads among themselves because they always access unique parts of the data? – Galik Jun 22 '16 at 23:56
  • @Galik the writer threads in this case are pooled AI's, UI and world simulation effects running at various frequencies, I need to pipe their outputs to the GPU together which is running at its own update interval but they have nothing to do with each other during the transportation – user81993 Jun 23 '16 at 01:11
  • @user81993 Is it possible you have typos with overusing the identifier `data` in your first code snippet? The local variable is shadowing the global one. – Nir Friedman Jun 23 '16 at 01:22
  • @NirFriedman oh yea, oops. – user81993 Jun 23 '16 at 02:00

2 Answers2

2

If you have C++14 capability then you can use a std::shared_timed_mutex to separate out readers and writers. In this scenario it seems you need to give your writer threads shared access (allowing other writer threads at the same time) and your reader threads unique access (kicking all other threads out).

So something like this may be what you need:

class MyClass
{
public:
    using mutex_type = std::shared_timed_mutex;
    using shared_lock = std::shared_lock<mutex_type>;
    using unique_lock = std::unique_lock<mutex_type>;

private:
    mutable mutex_type mtx;

public:

    // All updater threads can operate at the same time
    auto lock_for_updates() const
    {
        return shared_lock(mtx);
    }

    // Reader threads need to kick all the updater threads out
    auto lock_for_reading() const
    {
        return unique_lock(mtx);
    }
};

// many threads can call this
void do_writing_work(std::shared_ptr<MyClass> sptr)
{
    auto lock = sptr->lock_for_updates();

    // update the data here
}

// access the data from one thread only
void do_reading_work(std::shared_ptr<MyClass> sptr)
{
    auto lock = sptr->lock_for_reading();

    // read the data here
}

The shared_locks allow other threads to gain a shared_lock at the same time but prevent a unique_lock gaining simultaneous access. When a reader thread tries to gain a unique_lock all shared_locks will be vacated before the unique_lock gets exclusive control.

Galik
  • 47,303
  • 4
  • 80
  • 117
-1

You can also do this with regular mutexes and condition variables rather than shared. Supposedly shared_mutex has higher overhead, so I'm not sure which will be faster. With Gallik's solution you'd presumably be paying to lock the shared mutex on every write call; I got the impression from your post that write gets called way more than read so maybe this is undesirable.

int* data; // initialized somewhere
std::atomic<size_t> size = 0;
std::atomic<bool> reading = false;
std::atomic<int> num_writers = 0;
std::mutex entering;
std::mutex leaving;
std::condition_variable cv;

void write(int x) {
    ++num_writers;
    if (reading) {
        --num_writers;
        if (num_writers == 0)
        {
            std::lock_guard l(leaving);
            cv.notify_one();
        }          
        { std::lock_guard l(entering); }
        ++num_writers;
    }
    auto slot = size.fetch_add(1, std::memory_order_acquire);
    data[slot] = x;
    --num_writers;
    if (reading && num_writers == 0)
    {
        std::lock_guard l(leaving);
        cv.notify_one();
    }
}

int* read() {
    int* other_data = new int[capacity];
    {
        std::unique_lock enter_lock(entering);
        reading = true;
        std::unique_lock leave_lock(leaving);
        cv.wait(leave_lock, [] () { return num_writers == 0; });
        swap(data, other_data);
        size = 0;
        reading = false;
    }
    return other_data;
}

It's a bit complicated and took me some time to work out, but I think this should serve the purpose pretty well.

In the common case where only writing is happening, reading is always false. So you do the usual, and pay for two additional atomic increments and two untaken branches. So the common path does not need to lock any mutexes, unlike the solution involving a shared mutex, this is supposedly expensive: http://permalink.gmane.org/gmane.comp.lib.boost.devel/211180.

Now, suppose read is called. The expensive, slow heap allocation happens first, meanwhile writing continues uninterrupted. Next, the entering lock is acquired, which has no immediate effect. Now, reading is set to true. Immediately, any new calls to write enter the first branch, and eventually hit the entering lock which they are unable to acquire (as its already taken), and those threads then get put to sleep.

Meanwhile, the read thread is now waiting on the condition that the number of writers is 0. If we're lucky, this could actually go through right away. If however there are threads in write in either of the two locations between incrementing and decrementing num_writers, then it will not. Each time a write thread decrements num_writers, it checks if it has reduced that number to zero, and when it does it will signal the condition variable. Because num_writers is atomic which prevents various reordering shenanigans, it is guaranteed that the last thread will see num_writers == 0; it could also be notified more than once but this is ok and cannot result in bad behavior.

Once that condition variable has been signalled, that shows that all writers are either trapped in the first branch or are done modifying the array. So the read thread can now safely swap the data, and then unlock everything, and then return what it needs to.

As mentioned before, in typical operation there are no locks, just increments and untaken branches. Even when a read does occur, the read thread will have one lock and one condition variable wait, whereas a typical write thread will have about one lock/unlock of a mutex and that's all (one, or a small number of write threads, will also perform a condition variable notification).

Nir Friedman
  • 17,108
  • 2
  • 44
  • 72