How to insure variables are stored to memory before different thread reads them

Question

UPDATE: I asked this question in another form (see below), and it got closed for being not constructive. Kind of a shame since the answers exactly dealt with what I asked (and solved my problem), but I'm new here so I will certainly try again to make it more constructive.

I am working in VC++, under Windows 7. My multi-threaded program assigns values to variables in one thread, then sends a signal via an event object to a different thread that is blocked, waiting for that signal. Owing to things like optimizations contributed by the compiler, there is no guarantee that data assigned to a variable by one thread will actually be available to the other thread, even if one is sure (via the blocking mechanism) that the other thread will not attempt access until a time after the data has been assigned to the variable. For example, the value may be in a CPU register, remaining there until that register is needed for something else. This can avoid unnecessary loads from memory if the value is needed again soon after it was put into that register. Unfortunately, that means the corresponding location in memory continues to hold the last value it held prior to the new value being assigned. Thus, when the other thread unblocks, and accesses the memory holding the variable's value, it will obtain the old value, not the one most recently assigned.

The question, then, is: How does one Windows thread enforce storage to memory of values it assigns to variables, so that another thread is sure to have access to them at a later time? There may be several answers, but the one offered before this question was closed that seemed to be the best fit for what I needed was the use of a "memory fence," which was a programming construct I had not previously heard of. After the fence is encountered, pending writes to memory are guaranteed to have completed. (That's if the fence is a "write" fence; one can force a read from memory with a "read" fence, and one can do both with a "read/write" fence. Windows makes all three available quite easily within a VC++ program.)

One slight gotcha turned out to be that Windows fences (aka "memory barriers") only apply their guarantees to global, not local, storage (for reasons explained on the applicable MSDN pages).

If my interpretation here of how memory fences work is incorrect (and the moderators ever re-open this question), I'd be pleased to see that explained in the comments. I wouldn't ask if I weren't humble enough to admit I didn't know, after all. (If the moderators don't re-open it, but you can see I've got something wrong, please drop me an e-mail and let me know; I'll be glad to help keep this discussion alive at my blog, if you do.)

ORIGINAL VERSION
What's a good way to share data between threads?

I asked a question earlier about volatile variables that opened up an enormous learning experience for me. Among other things, I realized I wasn't asking the right question. Hope this isn't bad stackoverflow etiquette, but I think I should create a new question here that addresses my underlying issue:

I have two threads, A and B, in my Visual C++ program. B is blocked, waiting for a signal from A. A sets a number of variables. A then signals B, which will read the variables set by A. I am concerned that some of the variables set by A may not actually be written back to memory, as they may only reside in CPU registers.

What is a good way to be sure that thread B will, upon reading the variables previously set by thread A, read the values that thread A set?

I believe you will find that at it's core this is what multithreading is all about. How do you manage shared data between threads? — Spencer Ruport, May 22 '12 at 17:04
You should look up [IPC (inter-process communication)](http://en.wikipedia.org/wiki/Inter-process_communication)... — Eitan T, May 22 '12 at 17:05
@Spencer: If you mean, "How do I synchronize access to prevent race conditions?" I think I've got that managed by having each thread block while waiting for a signal from the other. My particular problem is in being able to guarantee that, once unblocked, a thread will actually have access to values written to shared locations by the other thread. — Stevens Miller, May 22 '12 at 17:13
@Stevens : What blocking mechanism are you using? If a critical section or mutex, then you're already good to go, as those have implicit memory barriers on Windows. — ildjarn, May 22 '12 at 17:14
@Christopher: You're right. Any pointers for me on inter thread communication? That would appear to be the right name for what I am trying to do. Would this be a place to use a message queue, perhaps? — Stevens Miller, May 22 '12 at 17:15
@ildjarn: I am blocking on event objects. Each thread resets an object, signals the other object (which the other thread is blocked on, waiting for that signal), then waits for a signal on the object it has reset. The two threads alternate this way, effectively guaranteeing that only one of them is running at any given moment (except for the interval between signaling the other thread and then calling WaitForSingleObject, during which no access is made to any of the shared data involved). — Stevens Miller, May 22 '12 at 17:28
If you are serious about learning this, get someone to buy you this : http://www.amazon.com/C-Concurrency-Action-Practical-Multithreading/dp/1933988770 — Steve Townsend, May 22 '12 at 17:37
@Steve: No, I'm only kidding around. 8-) Yeah, sure, I'm always up for a new skill-set. I'll put it in my Wish List. Thanks. — Stevens Miller, May 22 '12 at 17:47
If you can really learn C++ threading well, you will be in high demand. — Steve Townsend, May 22 '12 at 18:02
That's valuable to know. Most of my work is an expert witness. Something tells me that it will be a rare jury for which I must answer a question about C++ threading... Then again, harmony may break out all over, people will stop suing each other, and I will have to go back to an honest living. — Stevens Miller, May 22 '12 at 18:19

Christopher Oezbek · Accepted Answer · 2012-05-22T17:29:29.083

3

Under an x86 architecture there is not much to worry about when using a good library.

Guard the access to the shared data using mutexes (for instance boost::mutex) and if the implementor of the mutex did it right, then s/he will have used a memory barrier (Memory Barriers @ MSDN) to ensure that caches have been flushed to memory.

If you had to write your own sync code, then add memory barriers to it.

edited May 22 '12 at 17:29

answered May 22 '12 at 17:20

Christopher Oezbek

23,994
6
61
85

That sounds promising, but my first quick scan suggests those mechanisms are for dealing with atomicity and reordering. I believe my synchronization methods are already coping with those issues. – Stevens Miller May 22 '12 at 17:33
Accepting this answer since it was first to mention memory barriers, but thanks and an up-vote to Mehrdad for his answer, too. – Stevens Miller May 22 '12 at 17:57
If you're using mutices, you've already designed wrong X_X – djechlin May 22 '12 at 18:07
@jjechlin: As a contraction of MUTual EXclusion coerced into a noun, the plural of mutex is *mutexes*. *Mutices* would only be correct if the word had a direct Latin derivation ending in -ex ... and it is just very ugly ;-) – Clifford May 22 '12 at 18:15
@StevensMiller The docu is kind bad but `void MemoryBarrier(void);` is a processor command, which ensures that *memory accesses* are not reordered and caches flushed. – Christopher Oezbek May 23 '12 at 07:49

user541686 · Answer 2 · 2012-05-22T17:48:22.397

1

You mentioned in a comment, My particular problem is in being able to guarantee that, once unblocked, a thread will actually have access to values written to shared locations by the other thread.

I believe the answer to your question is simple: you can use _ReadWriteBarrier() (or, in your particular case, probably just _WriteBarrier inside the reading threads will do) to ensure that you read up-to-date memory values.

Note that, as far as I know, in C/C++, volatile is not guaranteed to have any memory barrier semantics -- so you can't simply use volatile in those languages. Memory barriers are the way to go for simply reading up-to-date values.

edited May 22 '12 at 17:48

answered May 22 '12 at 17:19

user541686

205,094
128
528
886

Yeah, it was `volatile` that got me into this fix in the first place. While it does seem to guarantee a degree of certainty regarding variables being written to memory upon every change (and, _please_ don't anyone jump down my throat over that; I have already realized that `volatile` is a touchy subject 8-) ), it isn't a fix to the general problem of knowing if values set by one thread before that thread deblocks another will be available to the deblocked thread. Thanks for the lead on memory barriers. I will do more studying on that. – Stevens Miller May 22 '12 at 17:36
@StevensMiller: Sure! Also, [this thread](http://stackoverflow.com/questions/2484980/why-is-volatile-not-considered-useful-in-multithreaded-c-or-c-programming) might have a better explanation than mine. – user541686 May 22 '12 at 17:37
That thread says, "memory barriers also ensure that all pending reads/writes are executed when the barrier is reached, so it effectively gives us everything we need by itself, making volatile unnecessary. We can just remove the volatile qualifier entirely." That sounds like a fix, if, by "pending reads/writes," it means reads/writes from/to _memory_. Is that what it means? I'll dig into the MSDN and see if I can confirm that. Thanks again. – Stevens Miller May 22 '12 at 17:51
1

@StevensMiller: Yes that's what it means -- registers are unaffected. – user541686 May 22 '12 at 22:49
That seems to have worked! I have dumped all use of `volatile` and simply added `_WriteBarrier()` before signaling the blocked thread. Now, you mentioned adding `_WriteBarrier()` inside the _reading_ threads. That thread has no pending writes. Would it still work there? Would it work regardless of which thread I put it in? Thanks for all the help. I have learned a lot today! – Stevens Miller May 23 '12 at 01:32
@StevensMiller: Sure, glad you learned something new! I'm actually not 100% sure -- I *think* `_WriteBarrier` should be indeed inside the *reading* threads, because the goal is that the read is getting the most up-to-date value. However, I think the *writing* thread would also need a *`_ReadBarrier`*, because it needs to ensure that reading threads are getting the most up-to-date value. I haven't done enough multithreaded code before to know whether this is 100% accurate (usually I just use `_ReadWriteBarrier` when I'm in doubt, haha) but that's my current understanding. – user541686 May 23 '12 at 03:18
@StevensMiller: Actually, it seems I was wrong -- MSDN's example indeed tells you to put the `_WriteBarrier` *after the write*. I guess that means I'm not so sure when a `_ReadBarrier` should be issued then -- I might be wrong on that too. Might want to look it up or ask someone else (or just use `_ReadWriteBarrier()` on both, since you can't go wrong with that, haha). – user541686 May 23 '12 at 03:22

score 0 · Answer 3 · answered May 22 '12 at 17:13

0

This is like asking "what's a good way to write an object-oriented program." Except to that question I would say "go read a good book," but to this one, there really isn't a good book on a bad paradigm. A lot of multithreaded programming is based on minimizing use of shared data rather than using it well.

So, my suggestions are:

1) Design so that no two threads need to communicate with each other in this way. Sounds more like one procedural thread is going on here than two truly independent threads.

2) Implement a service-oriented architecture within your process or between processes. Make all shared data occur over ephemeral request/response patterns rather than relying on the use of global variables that are polled. All these variables that A sets and tells B to read sounds a lot like a "request" that "client" A is sending to "server" B.

3) If you're OK with installing and putting effort into learning a library, I recommend ZMQ. I've had good experience with it and they advertise (and in my experience deliver) their tool, which on the service looks like a library to implement clients and servers in, as a way to get rid of all shared data between threads. If nothing else the documentation might give you good ways to think about cashing in your shared data between threads for patterns that don't involve them instead.

answered May 22 '12 at 17:13

djechlin

59,258
35
162
290

I get you, but I tried to be specific about two synched threads and the particular issue of being sure values cached in registers are written to memory before deblocking a waiting thread. I think that's a _little_ less vague "what's a good way to write an object-oriented program?" – Stevens Miller May 22 '12 at 17:22
WRT #1: You are correct, but the thread I am communicating with is created by a Windows API call. I can't design that out of my program. #2: My globals aren't polled; I synchronize the threads with each alternately waiting for the other to tell it to deblock. #3: I love a good library, but I can't redesign the structure of the system I am working with. – Stevens Miller May 22 '12 at 17:25
You are describing a request/reply pattern. The shared data is a request in one direction and a reply in another. The "deblock" command is when you're sending the request, and the "block" is listening for a request. That's how I would implement it, so that all shared data has limited lifetime and limited scope. – djechlin May 22 '12 at 17:32
That sounds like a match to what I am doing. Limited lifetimes and scopes are no problem, afaik. Got a reference or pointer for me on how to implement the pattern in a way that guarantees that the reads/writes to the shared data will all be from/to the same locations (that is, that will avoid the register-caching issue I'm addressing)? – Stevens Miller May 22 '12 at 17:40
If I understand right - the simple way to do this is a global RequestFromA* a_req; variable that B will access to find the data from A to use in the request. Problem is this only works if there is exactly one of A and one of B. The pattern you want is the "mediator pattern", though (see Design Patterns or the google.) – djechlin May 22 '12 at 18:05

score 0 · Answer 4 · edited May 23 '17 at 12:11

If you mean, "How do I synchronize access to prevent race conditions?" I think I've got that managed by having each thread block while waiting for a signal from the other. My particular problem is in being able to guarantee that, once unblocked, a thread will actually have access to values written to shared locations by the other thread.

Yes, exactly. The problem is that waiting on a signal set by some thread is not enough to ensure that any of that thread's other activities are visible from the current thread. A thread can set a variable, trigger the signal, and then a thread waiting on the signal can access the variable, but get a completely different value.

I'm currently enjoying Anthony Williams' book, C++ Concurrency in Action, on this topic. The answer seems to lay in using std::atomic memory orders correctly. Here's an example:

std::atomic<bool> signal(false);
std::atomic<int> i(0);

-- thread 1 --
i.store(100,std::memory_order_relaxed);
signal.store(true,std::memory_order_release);

-- thread 2 --
while(!signal.load(std::memory_order_acquire));
assert(i.load(std::memory_order_relaxed) == 100);

When the second thread sees the signal, a relationship is established between the store performed with memory_order_release and the load performed with memory_order_acquire which guarantees that the store to i will be visible in the second thread. Thus the assertion is guaranteed to hold.

On the other hand, if you use less strict memory orders then you don't get any guarantees.

-- thread 1 --
i.store(100,std::memory_order_relaxed);
signal.store(true,std::memory_order_relaxed);

-- thread 2 --
while(!signal.load(std::memory_order_relaxed));
int i2 = i.load(memory_order_relaxed);
// No guarantees about the value loaded from i!

Alternatively you can just use the default memory order which guarantees sequential consistency as long as you don't have any data races.

std::atomic<bool> signal(false);
int i = 0;

-- thread 1 --
i = 100;
signal = true;

-- thread 2 --
while(!signal);
assert(i == 100);

How to insure variables are stored to memory before different thread reads them

4 Answers4