Relying on network I/O to provide cross-thread synchronization in C++

Question

Can external I/O be relied upon as a form of cross-thread synchronization?

To be specific, consider the pseudocode below, which assumes the existence of network/socket functions:

int a;          // Globally accessible data.
socket s1, s2;  // Platform-specific.

int main() {
  // Set up + connect two sockets to (the same) remote machine.
  s1 = ...;
  s2 = ...;

  std::thread t1{thread1}, t2{thread2};
  t1.join();
  t2.join();
}

void thread1() {
  a = 42;
  send(s1, "foo");
}

void thread2() {
  recv(s2);     // Blocking receive (error handling omitted).
  f(a);         // Use a, should be 42.
}

We assume that the remote machine only sends data to s2 upon receiving the "foo" from s1. If this assumption fails, then certainly undefined behavior will result. But if it holds (and no other external failure occurs like network data corruption, etc.), does this program produce defined behavior?

"Never", "unspecified (depends on implementation)", "depends on the guarantees provided by the implementation of send/recv" are example answers of the sort I'm expecting, preferably with justification from the C++ standard (or other relevant standards, such as POSIX for sockets/networking).

If "never", then changing a to be a std::atomic<int> initialized to a definite value (say 0) would avoid undefined behaviour, but then is the value guaranteed to be read as 42 in thread2 or could a stale value be read? Do POSIX sockets provide a further guarantee that ensures a stale value will not be read?

If "depends", do POSIX sockets provide the relevant guarantee to make it defined behavior? (How about if s1 and s2 were the same socket instead of two separate sockets?)

For reference, the standard I/O library has a clause which seems to provide an analogous guarantee when working with iostreams (27.2.3¶2 in N4604):

If one thread makes a library call a that writes a value to a stream and, as a result, another thread reads this value from the stream through a library call b such that this does not result in a data race, then a’s write synchronizes with b’s read.

So is it a matter of the underlying network library/functions being used providing a similar guarantee?

In practical terms, it seems the compiler can't reorder accesses to the global a with respect to the send andrecv functions (as they could use a in principle). However, the thread running thread2 could still read a stale value of a unless there was some kind of memory barrier / synchronization guarantee provided by the send/recv pair itself.

It would also be interesting if someone knows whether this point is addressed by the C++ TS Extensions for Networking (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4711.pdf). Can't see anything relevant there on first browsing. — ndkrempel, Feb 28 '18 at 16:50
Related: https://stackoverflow.com/questions/10698253/is-function-call-an-effective-memory-barrier-for-modern-platforms — ndkrempel, Mar 01 '18 at 11:38

Mats Petersson · Answer 1 · 2018-02-28T17:30:57.013

Short answer: No, there is no generic guarantee that a will be updated. My suggestion would be to send the value of a along with "foo" - e.g. "foo, 42", or something like it. That is guaranteed to work, and probably not that significant overhead. [There may of course be other reasons why that doesn't work well]

Long rambling stuff that doesn't really answer the problem:

Global data is not guaranteed to be "visible" immediately in different cores of multicore processors without further operations. Yes, most modern processors are "coherent", but not all models of all brands are guaranteed to do so. So if thread2 runs on a processor that has already cached a copy of a, it can not be guaranteed that the value of a is 42 at the point when you call f.

The C++ standard guarantees that global variables are loaded after the function call, so the compiler is not allowed to do:

 tmp = a;
 recv(...);
 f(tmp);

but as I said above, cache-operations may be needed to guarantee that all processors see the same value at the same time. If send and recv are long in time or big in accesses enough [there is no direct measure that says how long or big] you may see the correct value most or even all of the time, but there is no guarantee for ordinary types that they are ACTUALLY updated outside of the thread that wrote the value last.

std::atomic will help on some types of processors, but there is no guarantee that this is "visible" in a second thread or on a second processor core at any reasonable time after it was changed.

The only practical solution is to have some kind of "repeat until I see it change" type code - this may require one value that is (for example) a counter, and one value that is the actual value - if you want to be able to say that "a is now 42. I've set a again, it's 42 this time too". If a is reppresenting, for example the number of data items available in a buffer, it is probably "it changed value" that matters, and just checking "is this the same as last time". The std::atomic operations have guarantees with regard to ordering, which allows you to use them to ensure that "if I update this field, the other field is guaranteed to appear at the same time or before this". So you can use that to guarantee for example a pair of data items are set to the "there is a new value" (for example a counter to indicate the "version number" of the current data) and "the new value is X".

Of course, if you KNOW what processor architectures your code will run on, you can plausibly make more advanced guesses as to what the behaviour will be. For example all x86 and many ARM processors use the cache-interface to implement atomic updates on a variable, so by doing an atomic update on one core, you can know that "no other processor will have a stale value of this". But there are processors available that do not have this implementation detail, and where an update, even with an atomic instruction, will not be updated on other cores or in other threads until "some time in the future, uncertain when".

Do you think making `a` a `std::atomic` initialized to `0` is enough to guarantee `42` will be read? I don't see how using an atomic provides sufficient visibility guarantees (beyond the mandated "visible in a reasonable amount of time"). Reference: 29.3¶12 in N4604, `Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.`. — ndkrempel, Feb 28 '18 at 17:13
No, you will need to set the `std::atomic` to 42 somewhere. But yeah, atomics do not GUARANTEE that either, I will delete the answer, as it's missleading. There is no way to guarantee what you want, as far as I'm aware. — Mats Petersson, Feb 28 '18 at 17:17
Feel free to edit your answer to include this discussion. I think it's still useful even if it doesn't fully answer the original question. (I was assuming the assignment `a = 42` is still present.) — ndkrempel, Feb 28 '18 at 17:17
I've edited the answer, but I'm not sure it's made it really that much better. It's a hard problem to solve. — Mats Petersson, Feb 28 '18 at 17:33

ndkrempel · Answer 2 · 2018-03-13T21:47:25.333

In general, no, external I/O can't be relied upon for cross-thread synchronization.

The question is out-of-scope of the C++ standard itself, as it involves the behavior of external/OS library functions. So whether the program is undefined behavior depends on any synchronization guarantees provided by the network I/O functions. In the absence of such guarantees, it is indeed undefined behavior. Switching to (initialized) atomics to avoid undefined behavior still wouldn't guarantee the "correct" up-to-date value will be read. To ensure that within the realms of the C++ standard would require some kind of locking (e.g. spinlock or mutex), even though it seems like waiting shouldn't be required due to the real-time ordering of the situation.

In general, the notion of "real-time" synchronization (involving visibility rather than merely ordering) required to avoid having to potentially wait after the recv returns before loading a isn't supported by the C++ standard. At a lower level, this notion does exist however, and would typically be implemented through inter-processor interrupts, e.g. FlushProcessWriteBuffers on Windows, or sys_membarrier on x86 Linux. This would be inserted after the store to a before send in thread1. No synchronization or barrier would be required in thread2. (It also seems like a simple SFENCE in thread1 might suffice on x86 due to its strong memory model, at least in the absence of non-temporal loads/stores.)

A compiler barrier shouldn't be needed in either thread for the reasons outlined in the question (call to an external function send, which for all the compiler knows could be acquiring an internal mutex to synchronize with the other call to recv).

Insidious problems of the sort described in section 4.3 of Hans Boehm's paper "Threads Cannot be Implemented as a Library" should not be a concern as the C++ compiler is thread-aware (and in particular the opaque functions send and recv could contain synchronization operations), so transformations introducing writes to a after the send in thread1 are not permissible under the memory model.

This leaves the open question of whether the POSIX network functions provide the necessary guarantees. I highly doubt it, as on some of the architectures with weak memory models, they are highly non-trivial and/or expensive to provide (requiring a process-wide mutex or IPI as mentioned earlier). On x86 specifically, it's almost certain that accessing a shared resource like a socket will entail an SFENCE or MFENCE (or even a LOCK-prefixed instruction) somewhere along the line, which should be sufficient, but this is unlikely to be enshrined in a standard anywhere. Edit: In fact, I think even the INT to switch to kernel mode entails a drain of the store buffer (the best reference I have to hand is this forum post).

Relying on network I/O to provide cross-thread synchronization in C++

2 Answers2

Linked