Race condition and unlocked write

Question

I have a question about race conditions and simultaneous writes.

I have a class who's objects are accessed from different threads. I would like to calculate some values only on demand and cache the result. For performance reasons I'd rather not use locks (before anyone asks - yes it is relevant in my case).

This constitutes a race condition. However, the objects are const and won't be changed. So if different threads calculate values to be cached they are in my use case guaranteed to be identical. Would it be safe to write these values without locking? Or, in broader terms, is it safe to write identical content to memory from different threads without locking?

The values written are of types bool and double and the architectures in question may be x86 and ARM.

EDIT: Thanx to everyone for their input. I have finally decided to find a way that does not involve caching. This approach does seem to much like a 'hack' and there is the problem with using a flag variable.

I doubt C++ guarantees it, but at least with modern hardware, it's hard to imagine how a problem would arise. — Jerry Coffin, Apr 03 '12 at 17:27
Each write operation to a primitive is atomic, so it shouldn't make a difference if many threads are simultaneously writing to the same set of fields. — Alain, Apr 03 '12 at 17:28
"So if different threads calculate values to be cached they are in my use case guaranteed to be identical"... This is strange. If you can guarantee that the values written will be identical, why write them concurrently? Or better, why write them more than once? — Paul Michalik, Apr 03 '12 at 17:36
@Paul Michalik: They are calculated and thus written (stored) on demand. To write them only once locking would be required because demand may occur in different threads. — Gabriel Schreiber, Apr 03 '12 at 17:41
I see. So then use a condition variable which signals whether a datum is already computed or not. This can be manipulated atomically inside of some kind of spin-lock using interlocked "CompareExchange" however this is implemented for your platform of choice. I think this is the smallest synchronization overhead you can get... — Paul Michalik, Apr 03 '12 at 17:49

score 4 · Accepted Answer · answered Apr 03 '12 at 20:52

As you say, this is a race condition. Under C++11 it is technically a data race, and undefined behaviour. It doesn't matter that the values are the same.

If your compiler supports it (e.g. recent gcc, or gcc or MSVC with my Just::Thread library) then you can use std::atomic<some_pod_struct> to provide an atomic wrapper around your data (assuming it is a POD struct --- if it isn't then you have bigger problems). If it is small enough then the compiler will make it lock-free, and use the appropriate atomic operations. For larger structures the library will use a lock.

The problem with doing this without atomic operations or locks is visibility. Whilst there is no problem at the processor level on either x86 or ARM with writing the same data (assuming it really is byte-for-byte identical) from multiple threads/processors to the same memory, given that this is a cache, I expect you'll want to read this data rather than recalculate it if it has already been written. You'll therefore need some sort of flag to indicate done-ness. Unless you use atomic operations, locks or suitable memory barrier instructions then the "ready" flag may become visible to another processor before the data does. This will then really mess things up, as the second processor now reads an incomplete set of data.

You could write the data with non-atomic operations, and then use an atomic data type for the flag. Under C++11 this will generate suitable memory barriers and synchronization to ensure that the data is visible to any thread that sees the flag set. It's still undefined behaviour for two threads to write the data, but it may be OK in practice.

Alternatively, store the data in a block of heap memory allocated by each thread that does the calculation, and use a compare-and-swap operation to set an atomic pointer variable. If the compare-and-swap fails then another thread got there first, so free the data.

Selected as best answer for pointing out the problem with a flag variable and the need for a memory barrier. — Gabriel Schreiber, Apr 04 '12 at 13:06
@Anthony, +1 for the technical meaning of data race in C++ vs general concept of race condition. C++ only specifies what data race is and the UB when it happens. So if a shared variable is a C++11 atomic type, it's guaranteed to be data race free, even if tagged w/ memory_order_relaxed ordering. The standard doesn't mention the general race condition concept and doesn't say there is UB when that happens. It totally depends on programmers to add propery sync primitives to ensure the correct semantics. Do you agree? — Eric Z, Aug 29 '14 at 02:50
@Anthony, there are some others believing that there is still data race in C++11 even if the shared data is of atomic type with relaxed ordering. I've a hard time trying to explain these concepts (correctly defined in the language as well as your book) to them. Could you please take a look at [this post](https://groups.google.com/forum/#!topic/comp.lang.c++.moderated/1SN85LRbouw) and share us with your opinion? Thanks. — Eric Z, Aug 29 '14 at 02:54

score 1 · Answer 2 · edited May 23 '17 at 12:20

You do not need to protect against writes to POD variables from different threads, if the values would be the same. If you have pointers involved, though, you should definitely do an interlocked exchange.

UPDATE: To clarify, for your situation caching and optimizations will not have any adverse effects, since you are writing the exact same value on all threads. For the same reason, you do not need to make the variable volatile. The only things that could potentially be a problem, is if your variable is not aligned to the word size of the machine. See https://stackoverflow.com/a/54242/677131 for more details. By default, variables are automatically aligned, but you can change the alignment explicitly.

There is an alternative approach that avoids this issue altogether. Since the variables will have the same values, either precompute them before concurrent execution starts, or let each thread have its own copy. The latter has the advantage of providing better performance on NUMA machines.

Maciej · Answer 3 · 2012-04-04T08:20:53.477

I have to start by saying that using locking is usually the right way of doing that but...

Writing to the same variable from multiple threads can't be unsafe even if data is bigger than processor word size. There is no transitional state where variable can be corrupted because at least one thread will finish writing the value. Other threads will not change it by wring over the same value.

So if there is a guarantee that results of computation will always be the same no matter what thread, there is no danger in doing that by multiple threads. Just check a flag ("is already computed?") before doing calculation. Multiple threads will enter the value computing code but once it is done, of course no other threads will do that any more. Obviously doing the same thing n-times is a waste of time though. Question here is, will using a lock save you any time or to the contrary? Only performance testing can give you the answer. Unless there are other reasons for not using locks.

asveikau · Answer 4 · 2012-04-03T17:50:44.990

1

Ultimately the answer would probably depend on your data structure.

In the "non-portable" realm, you might want to look into compare and swap, which most processors will let you do on a pointer-sized entity. To access that you can use inline assembly (on x86 these are the lock cmpxchg instructions), or perhaps GCC synchronization extensions. Upon seeing an uninitialized value, each thread could eagerly initialize, and issue a compare-and-swap to attempt to set a value. If the compare-and-swap fails it means another thread has beat you to it.

Ultimately usage of that operation often ends up being the equivalent of implementing a spinlock, though, which you may be looking to avoid ...

edited Apr 03 '12 at 17:50

answered Apr 03 '12 at 17:33

asveikau

39,039
2
53
68

Part of the data structure will be a 8-byte double on (32-bit) ARM, so they are larger than the 'pointer-sized entity'. So this would not be an option. – Gabriel Schreiber Apr 03 '12 at 17:39
@GabrielSchreiber - That doesn't mean it's not an option. For example you could initialize a pointer to the structure instead of the struct itself. Or you could have a "companion" word that essentially serves as a lock around the initialization - to determine whether it's been initialized before. However I in general would encourage you to be careful with calling things "atomic" if they rely on writing multiple words at a time. – asveikau Apr 03 '12 at 17:42
@GabrielSchreiber - I must stress that if you want read-modify-write operations to be atomic, compare-and-swap (or the platform-specific equivalent; "load-link/store-conditional" being the RISC version) is pretty much the only way to go. – asveikau Apr 03 '12 at 17:54

Race condition and unlocked write

4 Answers4