2

I'm using C++11 and I'm aware that concurrent writes to std::vector<bool> someArray are not thread-safe due to the specialization of std::vector for bools.

I'm trying to find out if writes to bool someArray[2048] have the same problem:

  • Suppose all entries in someArray are initially set to false.
  • Suppose I have a bunch of threads that write at different indices in someArray. In fact, these threads only set different array entries from false to true.
  • Suppose I have a reader thread that at some point acquires a lock, triggering a memory fence operation.

Q: Will the reader see all the writes to someArray that occurred before the lock was acquired?

Thanks!

Alin Tomescu
  • 393
  • 1
  • 4
  • 20
  • if you need thread safety just use [`std::atomic_bool`](http://en.cppreference.com/w/cpp/atomic/atomic) and call it done – Mgetz Oct 12 '17 at 20:16

3 Answers3

4

You should use std::array<bool, 2048> someArray, not bool someArray[2048];. If you're in C++11-land, you'll want to modernize your code as much as you are able.

std::array<bool, N> is not specialized in the same way that std::vector<bool> is, so there's no concerns there in terms of raw safety.

As for your actual question:

Will the reader see all the writes to someArray that occurred before the lock was acquired?

Only if the writers to the array also interact with the lock, either by releasing it at the time that they finish writing, or else by updating a value associated with the lock that the reader then synchronizes with. If the writers never interact with the lock, then the data that will be retrieved by the reader is undefined.

One thing you'll also want to bear in mind: while it's not unsafe to have multiple threads write to the same array, provided that they are all writing to unique memory addresses, writing could be slowed pretty dramatically by interactions with the cache. For example:

void func_a() {
    std::array<bool, 2048> someArray{};
    for(int i = 0; i < 8; i++) {
        std::thread writer([i, &someArray]{
            for(size_t index = i * 256; index < (i+1) * 256; index++) 
                someArray[index] = true;
            //Some kind of synchronization mechanism you need to work out yourself
        });
        writer.detach();
    }
}

void func_b() {
    std::array<bool, 2048> someArray{};
    for(int i = 0; i < 8; i++) {
        std::thread writer([i, &someArray]{
            for(size_t index = i; index < 2048; index += 8) 
                someArray[index] = true;
            //Some kind of synchronization mechanism you need to work out yourself
        });
        writer.detach();
    }
}

The details are going to vary depending on the underlying hardware, but in nearly all situations, func_a is going to be orders of magnitude faster than func_b, at least for a sufficiently large array size (2048 was chosen as an example, but it may not be representative of the actual underlying performance differences). Both functions should have the same result, but one will be considerably faster than the other.

Xirema
  • 19,889
  • 4
  • 32
  • 68
  • Thank you Xirema! This is very good to know, I've read something similar in another SO post. Also, I actually have an `std::atomic` that all writers will decrement _after_ writing the bool array. I suppose the reader could simply read that same atomic variable to trigger a memory fence? – Alin Tomescu Oct 12 '17 at 20:21
  • 1
    @AlinTomescu Yes, that would probably be enough. However, I would recommend you instead write/acquire a proper *Reader/Writer Lock* to guarantee correct behavior, or else use a Mutex + Semaphore (which is how most Reader/Writer Locks are implemented). – Xirema Oct 12 '17 at 20:24
  • @Xirema or they could just use atomics... which don't need reader writer locks. – Mgetz Oct 12 '17 at 20:26
  • @Mgetz Atomics are how Semaphores are (often) implemented, which are a component of R/W Locks. I'm trying to steer the OP away from low-level synchronization towards higher-level synchronization, since low-level synchronization objects are a big, big source of errors and confusion. – Xirema Oct 12 '17 at 20:28
  • @Xirema if they are just making an array of flags they don't need high level primitives that wait and block; atomics will perform that function perfectly. Using higher level synchronization primitives is like trying to treat a cold with open heart surgery. – Mgetz Oct 12 '17 at 20:38
  • @Xirema, I'm actually wondering what's _the least_ I can do in this case for performance reasons (assuming I fix the problem you mentioned). Is there anything better than incrementing an atomic in the writing thread and reading it in the reader thread? – Alin Tomescu Oct 12 '17 at 23:08
  • @AlinTomescu not that's safe according to the standard, no – Mgetz Oct 13 '17 at 12:29
  • @Mgetz, why not? Reading [here](http://en.cppreference.com/w/cpp/atomic/memory_order#Release-Acquire_ordering) it says: _If an atomic store in thread A is tagged `memory_order_release` and an atomic load in thread B from the same variable is tagged `memory_order_acquire`, **all memory writes (non-atomic and relaxed atomic) that happened-before the atomic store from the point of view of thread A, become visible side-effects in thread B**, that is, once the atomic load is completed, thread B is guaranteed to see everything thread A wrote to memory."_ – Alin Tomescu Oct 13 '17 at 14:15
  • @AlinTomescu I was referring to something not `atomic`. Atomic is the only safe way for a variable to be read concurrently across threads. What you're describing is using them for some sort of spin-lock, in which case you might as well follow Xirema's advice and use a higher level primitive. – Mgetz Oct 13 '17 at 14:53
  • Thank you Mgetz. Perhaps I should ask a separate question on this issue. I'm familiar with using high-level primitives for synchronization. However, right now, I'm writing some performance critical code that I would like "all juice squeezed out of." – Alin Tomescu Oct 13 '17 at 14:57
0

First of all, the general std::vector is not thread-safe as you might think. The guarantees are already stated here.

Addressing your question: The reader may not see all writes after acquiring the lock. This is due to the fact that the writers may never have performed a release operation which is required to establish a happens-before relationship between the writes and the subsequent reads. In (very) simple terms: every acquire operation (such as a mutex lock) needs a release operation to synchronize with. Every memory operation done before a release onto a certain veriable will be visible to any thread that acquired the same variable. See also Release-Acquire ordering.

Jodocus
  • 7,493
  • 1
  • 29
  • 45
  • To be clear, the thing that has to match the acquire operation inherent in locking a mutex is the release operation inherent in unlocking *that same mutex*. I.e. those two operations establish an ordering by rendezvousing on the address of the mutex. The release/acquire part can also be established using atomic_store and atomic_load on the array of bools, though avoiding races will still require some form of mutexing. – Zalman Stern Oct 12 '17 at 20:15
  • Thank you! The release-acquire ordering is good to know actually. However, it seems that `std::vector` (when `T != bool`) does support concurrent writes: see 3rd item in "Thread safety" section [here](http://en.cppreference.com/w/cpp/container) which says _"Different elements in the same container can be modified concurrently by different threads, except for the elements of std::vector."_ Obviously, if you `push_back`, as described in the previous SO post, that's not thread-safe. – Alin Tomescu Oct 12 '17 at 20:16
0

One important thing to note is that all operations (fetch and store) on an int32 sized variable (such as a bool) is atomic (holds true for x86 or x64 architectures). So if you declare your array as volatile (necessary as each thread may have a cached value of the array), you should not have any issues in modifying the array (via multiple threads).