10

Is there any guarantee by any commonly followed standard (ISO C or C++, or any of the POSIX/SUS specifications) that a variable (perhaps marked volatile), not guarded by a mutex, that is being accessed by multiple threads will become eventually consistent if it is assigned to?

To provide a specific example, consider two threads sharing a variable v, with initial value zero.

Thread 1: v = 1

Thread 2: while(v == 0) yield();

Is thread 2 guaranteed to terminate eventually? Or can it conceivably spin forever because the cache coherency never kicks in and makes the assignment visible in thread 2's cache?

I'm aware the C and C++ standards (before C++0x) do not speak at all about threads or concurrency. But I'm curious if the C++0x memory model, or pthreads, or anything else, guarantees this. (Apparently this does actually work on Windows on 32-bit x86; I'm wondering if it's something that can be relied on generally or if it just happens to work there).

Jack Lloyd
  • 8,215
  • 2
  • 37
  • 47
  • Cache coherency is implemented on the CPU, and it **always** kicks in (at least on mainstream architectures). It's not something the software has *any* say over. If something is written to cache, it is written to memory, and all other threads will see it. That is not the issue with threading. The issue is whether a memory write *happens at all*, and if it happens at the expected time – jalf Jun 25 '10 at 09:48
  • 1
    It will work on the Intel architectures. I've heard rumors of architectures on which it will not work, but I've never seen one in person. – Omnifarious Oct 02 '11 at 06:15
  • 1
    ARM (for example) have designed multi-core architectures that don't have coherent cache. Not sure how much those designs are actually used. The advantage is you save some silicon and heat used to keep everything synched, but of course the disadvantage is that it confuses the heck out of people used to the Intel threading model. – Steve Jessop Nov 03 '11 at 02:47

6 Answers6

12

It's going to depend on your architecture. While it is unusual to require an explicit cache flush or memory sync to ensure memory writes are visible to other threads, nothing precludes it, and I've certainly encountered platforms (including the PowerPC-based device I am currently developing for) where explicit instructions have to be executed to ensure state is flushed.

Note that thread synchronisation primitives like mutexes will perform the necessary work as required, but you don't typically actually need a thread synchronisation primitive if all you want is to ensure the state is visible without caring about consistency - just the sync / flush instruction will suffice.

EDIT: To anyone still in confustion about the volatile keyword - volatile guarantees the compiler will not generate code that explicitly caches data in registers, but this is NOT the same thing as dealing with hardware that transparently caches / reorders reads and writes. Read e.g. this or this, or this Dr Dobbs article, or the answer to this SO question, or just pick your favourite compiler that targets a weakly consistent memory architecture like Cell, write some test code and compare what the compiler generates to what you'd need in order to ensure writes are visible to other processes.

Community
  • 1
  • 1
moonshadow
  • 86,889
  • 7
  • 82
  • 122
  • The compiler must do what it fits suite to guarantee that all writes to `volatile` variables actually hit main memory, and that will in turn make it visible by other threads. – David Rodríguez - dribeas Jun 24 '10 at 22:03
  • 13
    @David: this is mistaken. "Accesses to `volatile` objects must be evaluated strictly according to the abstract machine defined by the language standard." This is a statement about what kind of optimisations C++ may perform, not what extra processing the programmer may wish to do to deal with architectural quirks. It is saying the compiler must generate an explicit write instruction for each assignment in the source, but it does not say anything about generating `flush` or `sync` or `eieio` or whatever your CPU may need to actually cause the data to hit memory in program order or at all. – moonshadow Jun 24 '10 at 22:24
  • There are more statements about `volatile`. The critical one is that their reads and writes are observable side-effects. In particular, the loop from the question **must** read v repeatedly. It may not cache the value. Not in a register, not in L1 cache, not anywehere else. – MSalters Jun 25 '10 at 08:54
  • 5
    @MSalters: your conclusion, again, is incorrect. Reads and writes to volatiles are observable side effects: again, this is a statement about the kind of optimisations a compiler may not perform, not a statement about additional code it must generate. The compiler may not generate code that caches the volatile data, but the hardware caching data it was told to store is not the compiler's responsibility. – moonshadow Jun 25 '10 at 09:49
  • True that you don't usually need to do anything to make the state visible to other threads. But usually you *do* care about consistency (usually you're performing this synchronization in order to ensure that certain code is only executed at a certain time, in a certain state), and then the OP's code can no longer be relied on. – jalf Jun 25 '10 at 09:59
  • @moonshadow: you're making an artificial distinction between code generation and optimisations. The compiler must generate correct code, and it doesn't matter what algorithm it uses for that. The same thing applies to your hardware caching notion. A conforming C++ implementation will emit instructions to tell the hardware not to do that. Failing to include any necessary instruction makes an implementation nonconforming; cache directives are no exception. – MSalters Jun 28 '10 at 08:27
  • 2
    @MSalters: you are exaggerating the compiler's responsibility. It's up to the programmer, not the compiler, to select the most appropriate way of dealing with their hardware's concurrency issues. `volatile` is a way of telling the compiler not to reorder writes so you're not fighting the compiler as well. There is an excellent reason for this separation of responsibility: architectures like the Cell contain multiple synchronisation / fence / barrier instructions with vastly different costs and effects, and the compiler has no way of knowing which is the most appropriate to a given situation. – moonshadow Jun 28 '10 at 09:25
  • 1
    @MSalters: have a read of [PowerPC Architecture book II](http://www.ibm.com/developerworks/systems/library/es-archguide-v2.html) sections 3.2-3.3, and consider what a compiler that conforms to your interpretation of the C++ spec is supposed to emit. The compiler simply does not have the information for a sensible decision (has the memory page been configured with write-combining? Are you writing to memory shared between threads on the same PPU core, or to some device that is affected by `eieio` but not `lwsync`?), which is why the spec does not require it and real-world compilers do not do it. – moonshadow Jun 28 '10 at 09:32
  • @moonshadow: I could care, and a Cell compiler should provide an extension for that. But if I just say `volatile`, I want the compiler to do what the standard tells it to (generate observable reads and writes) without bothering me with details. Should it put volatile variables on page that is `eieo` but not `lwsync`? I don't care the slightest. – MSalters Jun 28 '10 at 09:33
  • @MSalters "_The critical one is that their reads and writes are observable side-effects._" This is a statement about the way the language is specified, not about the actual code that the implementation must emit. It's up to each implementation to defined what this means. Most do simple "load" and "store" for reads and writes of volatile variables. – curiousguy Oct 02 '11 at 05:19
  • "_explicit instructions have to be executed to ensure state is flushed._" Is there any case where the OS will not execute these instructions? – curiousguy Oct 02 '11 at 05:20
  • @MSalters "_The critical one is that their reads and writes are observable side-effects._" Observable from where? Who is the observer? – curiousguy Oct 17 '11 at 20:13
  • 1
    @curiousguy: Left unspecified by the standard. The standard talks about `observable behavior` only. That's the subset of behavior where the standard doesn't allow deviations under the as-if rule. Therefore, skipping a variable read from memory is allowed under the as-if rule **unless** that read is observable behavior => `volatile`. – MSalters Oct 18 '11 at 07:06
  • @MSalters So it doesn't say that you can "observe" volatile store/loads from another thread. Or from RAM. Or from some device. You can observe load/stores with a CPU emulator. With a debugger. You can see the **current state** of volatile variables with a signal handler sent **to the right thread**. – curiousguy Oct 19 '11 at 02:20
  • @curiousguy: you might want to ask a question of your own; comments are not the place to get an answer. – MSalters Oct 20 '11 at 07:53
5

If I've understood correctly the relevant sections, C++0X won't guaranteed it for standalone variable or even volatile one (volatile isn't designed for that use), but will introduce atomic types for which you'll have the guarantee (see header <atomic>).

AProgrammer
  • 51,233
  • 8
  • 91
  • 143
3

First off, if it's not marked volatile there is a good chance the compiler may only load it once. So regardless of whether the memory eventually changes, there is no guarantee the compile will set it.

Since you explicitly say "no mutexes", pthreads doesn't apply.

Beyond that, since C++ does not have a memory model, it depends on the hardware architecture.

R Samuel Klatchko
  • 74,869
  • 16
  • 134
  • 187
  • 1
    Not in that example if yield() is a function the compiler cannot see the body of, and the variable is not local to the compilation unit, since the compiler has to assume that the yield() function could change the value of v. @moonshadow's answer still applies, of course. – CesarB Jun 25 '10 at 00:38
  • N.B. this answer is no longer relevant. C++ has a memory model, which is followed by most compilers today (and they use that same model even for C++03 code, because it's the only portable and well-defined model we have). Even before C++11, relying on `volatile` for this was not portable as compilers didn't agree on its meaning. To write correct multithreaded C++ code before the C++11 memory model it was necessary to use platform-specific intrinsic functions for atomic operations, not rely on `volatile`. – Jonathan Wakely Dec 02 '15 at 11:07
3

This is a potential data race.

With respect to POSIX thread, this is UB. Same with C++ I believe.

In practice I cannot imagine how it could fail.

curiousguy
  • 8,038
  • 2
  • 40
  • 58
2

Is thread 2 guaranteed to terminate eventually? Or can it conceivably spin forever because the cache coherency never kicks in and makes the assignment visible in thread 2's cache?

If the variable is not volatile, you have no guarantees. Pre-C++0x, the standard just has nothing to say about threads, and since the variable is not volatile, reads/writes are not considered observable side effects, so the compiler is allowed to cheat. Post-C++0x, it's a race condition, which is explicitly stated to be undefined behavior.

If the variable is volatile, you get the guarantee that reads/writes will happen, and that the compiler won't be reordered with respect to other volatile memory accesses. (However, this does not by itself guarantee that the CPU won't reorder these memory accesses -- just that the compiler won't)

But you have no guarantee that it won't be reordered with respect to other non-volatile accesses, so you might not get the behavior you expected. In particular, some of the instructions after the while loop, which you're trying to "protect" may be moved up ahead of the loop if the compiler deems it safe (and beneficial) to do so. But in performing this analysis, it only looks at the current thread, not what happens in other threads.

So no, in general, it is not guaranteed to work correctly, even with volatile. It might, and it probably often will, but not always (and it depends on what happens after the loop). It depends on how far the compiler is willing to go with optimizations. But it is allowed to go far enough to break the code. So don't rely on it. If you want to synchronize around something like this, use memory barriers. That's what they're for. (And if you do that, you don't even need the volatile any more)

jalf
  • 243,077
  • 51
  • 345
  • 550
  • 1
    +1, I have just one quibble: `volatile` only guarantees the compiler won't re-order accesses w.r.t. other volatile accesses. If the variable marked volatile is in main memory (not memory-mapped hardware) then the CPU can still re-order accesses, because the CPU has no idea if the source code said it was volatile or not. As far as the CPU is concerned it's just an address in memory. That means that volatile variables cannot be relied on for sequential consistency. I completely agree that once you add barriers to ensure correct ordering volatile is not needed, so it's no use in the first place. – Jonathan Wakely Dec 02 '15 at 11:16
0

I think it will work eventually on any platform, but no idea on the delay that you may see.

But honestly, it is really bad style to do a polling wait for an event. Even though you do a yield your process will be rescheduled again and again, without doing anything.

Since you already know how to place a variable somewhere where it is accessible to both, why not use the right tools to do a wait that doesn't eat up resources? A pair of pthread_mutex_t and pthread_cond_t should perfectly do the trick.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177