Memory access in C++ multithreading environment

Question

I have an issue where a function in main thread is blocked until a local variable is set in another thread. I use semaphore to block the main thread execution:

int sendRequest(Message request, void *data)
{
semaphore waitForReply;
volatile int result = 0;
volatile int retData[100];
client->sendRequest(request, [&result, &retData](const Message& reply){ // Callback is called from a different thread
   result = reply.result;
   memcpy(retData, reply.retData, sizeof(retData));
   waitForReply.signal();
})
waitForReply.wait();
//At this line we want result var to be updated.
memcpy(data, retData, sizeof(retData));
return result;
}

Question is does using volatile int result guarantee that the returning result is the actual value received from the callback? and is this the best way to solve this issue or using normal variable and mutex lock is better?

How about the case with array retData? (please don't mind the size of the array)

Related: [What's the difference of the usage of volatile between C/C++ and C#/Java?](https://stackoverflow.com/questions/19923352/whats-the-difference-of-the-usage-of-volatile-between-c-c-and-c-java) — , Jul 20 '17 at 02:38
Your code looks correct to me. The `volatile` isn't doing anything useful, you should remove it. Your callback signals the semaphore only after the data has been copied out, so you are guaranteed to have correct data after `wait()` returns. Or, to put in another way, it's not the `volatile` that guarantees correct behavior here. The guarantee comes from `signal()` being called after the data has been copied, and that `wait()` will block and wait for that signal. — Nikos C., Jul 20 '17 at 03:13
@NikosC. this is the thing I am unsure, please correct me if I am wrong. What if when result = reply.result;, the new value of result is not yet written to memory but cached somewhere in register, then the returning result (return result;) is an old value? — Maluvel, Jul 20 '17 at 03:18
This won't happen. The code writes directly to result, since you passed it by reference. There is no reason for the compiler to go crazy and change the order of your statements. — Michaël Roy, Jul 20 '17 at 03:22
@Maluvel `result` is guaranteed to contain the data you think it does, since you are waiting on a semaphore. Even if the compiler's optimizer is holding the value in a register, it will generate code that still behaves correctly. For example, in the `return` statement, the compiler might fetch the value from that register. Whenever you wait on a semaphore or on a mutex lock, the compiler guarantees that any data after that is going to be synchronized correctly, regardless of how that synchronization is being achieved by the optimizer. — Nikos C., Jul 20 '17 at 03:49
@NikosC. thank you! this answer clears my doubt "Whenever you wait on a semaphore or on a mutex lock, the compiler guarantees that any data after that is going to be synchronized correctly" — Maluvel, Jul 20 '17 at 04:05

zneak · Accepted Answer · 2017-07-20T04:52:36.900

Volatile isn't useful in C or C++ for multithreaded code because it does not create memory barriers. However, there's nothing in your code that requires a memory barrier, so you're already fine.

An example of bad code would be:

// initially
my_pointer = nullptr;
ready_flag = false;

// thread 1
my_pointer = some_pointer_here;
ready_flag = true;

// thread 2
while (!ready_flag) /* wait */;
my_pointer->foo = bar;

This code is dangerous because a wide range of things can make the write to ready_flag become visible before the write to my_pointer, even though it appears second in the source code. In that case, you could access my_pointer before it's assigned a value.

A memory barrier is an instruction that enforces a specific order for when memory writes become visible. For instance, that code would need a memory barrier between the write to my_pointer and the write to ready_flag. The C++ standard library offers std::atomic_thread_fence for that purpose. Additionally, std::atomic types can all generate memory barriers.

In your case, your variables have no interdependencies (other than that the entire operation must have completed), and the semaphore's wait method almost certainly has a barrier anyway.

The exact way of achieving cache and memory consistency between cores/CPUs depends on the platform. AFAIK, on x86 and ARM derivatives, you don't need to do anything at all. On crazier platforms, this is generally taken care of by memory fences, which you would be using anyway.

thank you for the answer, that "semaphore wait method guarantees memory barrier" solves my question. — Maluvel, Jul 20 '17 at 04:08

Michaël Roy · Answer 2 · 2017-07-20T02:56:55.790

2

The data in the resulting buffer is good. The call to memcpy in the lambda, insures the data is copied in its entirety before the semaphore waitForSignal is signaled. But...

No, the volatile keyword does nothing in this instance. memcpy is not affected by the volatile keyword. volatile only affects how the code is optimized.
Why don't you capture data and have the lambda fill that buffer directly? this would save a copy.
I don't see any parameter that states the size of the output buffer data. This is not very good practice. I suggest you consider using a std::vector or some other safe way to implement buffers.

edited Jul 20 '17 at 02:56

answered Jul 20 '17 at 02:45

Michaël Roy

6,338
1
15
19

I want when the code passes by this line (//At this line we want result var to be updated.), the values of data[] and result are updated. If I capture data and do memcpy with it, can it happen that when the function returns, the value of data[] is not yet updated? If volatile does not help to guarantee that, what could be done to achieve above expected result? Please dont mind the size (3), this is just an example. – Maluvel Jul 20 '17 at 02:49

Memory access in C++ multithreading environment

2 Answers2