relaxed ordering as a signal

Question

Let's say we have two thread. One that give a "go" and one that wait a go to produce something.

Is this code correct or can I have an "infinite loop" because of cache or something like that?

std::atomic_bool canGo{false};

void producer() {
    while(canGo.load(memory_order_relaxed) == false);
    produce_data();
}

void launcher() {
    canGo.store(true, memory_order_relaxed);
}

int main() {
    thread a{producer};
    thread b{launcher};
}

If this code is not correct, is there a way to flush / invalidate the cache in standard c++?

I needed to refresh my knowledge of this. Very helpful: https://www.modernescpp.com/index.php/fences-as-memory-barriers https://www.modernescpp.com/index.php/acquire-release-fences — Researcher, Jul 05 '19 at 16:57
Also I just recalled that you can get away with a lot on x86 just by using compiler barriers: https://bartoszmilewski.com/2008/11/05/who-ordered-memory-fences-on-an-x86/ — Researcher, Jul 05 '19 at 17:02
Thanks :-). I think I need to protect the canGo variable with acquire release semantic — Antoine Morrier, Jul 05 '19 at 18:16
Looks like a binary semaphore, `producer` is `acquire`, and `launcher` is `release`. — Evg, Jul 05 '19 at 18:42
In any case, no need for a release during the store as there's nothing else happening in the launcher where ordering matters. In the producer, as per that bartos link above, if you are using x86, load/store or load/load will not be reordered in the CPU. So all you need is a compiler fence to stop the while loop being re-ordered below the produce_data (atomic_signal_fence). You can even sometimes get away with using standard variables (GASP!): https://preshing.com/20130618/atomic-vs-non-atomic-operations/ However using Atomic variables is always safe regarding multiple instructions per op. — Researcher, Jul 05 '19 at 19:29

score 5 · Accepted Answer · answered Jul 05 '19 at 16:46

5

A go signal like this will usually be in response to some memory changes that you'll want the target to see.

In other words, you'll usually want to give release/acquire semantics to such signaling.

That can be done either by using memory_order_release on the store and memory_order_acquire on the load, or by putting a release fence before the relaxed store and and an acquire fence after the relaxed load so that memory operations done by the signaller before the store are visible to the signallee (see for example, https://preshing.com/20120913/acquire-and-release-semantics/ or the C/C++ standard).

The way I remember the ordering of the fences is that, as far as I understand, shared memory operations among cores are effectively hardware implemented buffered IO that follows a protocol, and a release fence should sort of be like an output buffer flush and an acquire fence like an input buffer flush/sync.

Now if you flush your core's memory op output buffer before issuing a relaxed store, then when the target core sees the relaxed store, the preceding memory op messages must be available to it and all it needs to see those memory changes in its memory is to sync them in with an acquire fence after it sees the signalling store.

answered Jul 05 '19 at 16:46

Petr Skocik

58,047
6
95
142

Maybe I forget something in the question. I know that I need acquire and release operation when I need something like consumer producer. However here I dont need to see other value than the `canGo` :-) – Antoine Morrier Jul 05 '19 at 17:59
2

@AntoineMorrier Then you don't need the fences. – Petr Skocik Jul 05 '19 at 18:16
I thinkI need to protect the canGo variable with acquire release semantic to not have any problem – Antoine Morrier Jul 05 '19 at 18:17
1

@AntoineMorrier The atomic loads and store will be atomic (naturally) and the standard requires that the implementation make them visible "within a reasonable amount of time" (http://port70.net/~nsz/c/c11/n1570.html#7.17.3p16 for C. I'm sure C++ has something similar). Consequently it's impossible to encounter the `canGo` variable with a torn/trap value, and because the changes must propagate "within a reasonable amount of time" (practically <1µs), infinite looping is out of the question. That said a stricter acquire/release certainly shouldn't do any harm. – Petr Skocik Jul 05 '19 at 18:29
@AntoineMorrier I wouldn't even expect acquire/release fences here to even have much of a performance impact either. Especially around thread creation, which takes quite a few µs (around 20 on my Linux laptop). – Petr Skocik Jul 05 '19 at 18:31
I see. So if lets say, I have something like a `if(canGo)` instead of a while, and the variable is set by a user (via a gui) I can not have something like "if failing" because it reads trap data ? – Antoine Morrier Jul 05 '19 at 22:08
1

@AntoineMorrier Trap/torn data is just a theoretical possibility on some architectures iff the variable isn't atomic. Yours is so you don't need to worry about that. – Petr Skocik Jul 05 '19 at 22:14
@PSkocik, you will need a read-acquire because that prevents any subsequent writes (in the `produce_data()` function for example) from being moved before the read-acquire. The write to `canGo` can be relaxed though as there are no ordering constraints, just inter-thread visibility. – Eric Jul 05 '19 at 22:37
@Eric `produce_data()` should be control-dependency-ordered after the `true` value from the variable is read. I don't think you _need_ the fences because of that but I do think they're a good idea nonetheless. – Petr Skocik Jul 05 '19 at 22:53
@PSkocik, would consume semantics on the load be more appropriate in that case? – Eric Jul 06 '19 at 06:58
1

@Eric That would be useful if the dependency were a data dependency. But this would be a _control_ dependency, and those, AFAIK, don't need to be explicitly ordered because, AFAIK, C/C++ ban speculative writes, which is exactly what would need to happen for `produce_data()` to move up. – Petr Skocik Jul 06 '19 at 11:04
@PSkocik, ah okay, yeah, makes sense that writes cannot be hoisted above the load because that would be speculative. – Eric Jul 06 '19 at 14:07

relaxed ordering as a signal

1 Answers1

Linked