0

I've seen the argument that the reson why it is named "weak", is because the "strong" version has happens-before "orderings" behavior.

In my understanding "Happens before" is a term that denotes memory consistency, NOT sequential execution OR strong memory ordering, it always confuses me since people talk interchangeably about these things.

If you think deeply about reordering and how such an interaction would relate to the LOCK prefix, it simply makes little sense.

There are degrees of reordering, one that happens at compile and the other at runtime. If some sort of mechanism would kick at the LOCK prefix... How would that look like?... a runtime inlining of the spinlock while fencing the entire while loop for each of the concurrent lines of code accessing it??

My understanding of what the Happens-Before behavior entails, is that it ENSURES to tell you what happened before your current action... BUT this current action MAY OR MAY NOT follow a sequential consistency of execution among other concurrent actions AND the code the actions is comprised of may have been reordered prior to such execution... meaning that race conditions still apply and... NONE OF THIS INVOLVES reordering.

If we Have 2 twin threads: same code, same shared memories, their code may been reordered FROM THE START, even before reaching the atomic pressure point, and a strong compareAndSet will not prevent this reordering(?), it will simply ensure the happens-before as in:

"A losing Thread can ATTEST that this was the memory's state (witness) before it arrived"-type of Happens-Before... the only type of happens-before.

This is NOT reordering; this is NOT sequential consistency of execution.

According to the LOCK prefix, once the instruction attempts a BUS entrance and sees it as being locked, it will NOT re-attempt to acquire it(?), it will simply escape with a failure(?).

If this is true, then the retry-acquire behavior IS NOT INTRINSIC to the LOCK prefix. and it becomes a responsibility of the higher level to attempt a retry.

If the LOCK + cmpxchg combo applied by the Java's strong compareAndSet ensures a "happens-before" it means that when the LOCK acquiring fails, a native spin will RETRY to acquire the lock, and only when acquired, it is able to stablish a mutually exclusive read and write coupling... the term (LL/SC) seems to be an abstraction describing the resulting behavior, but NOT the "HOW ITS ACHIEVED", and I haven't seen literature explaining it at depth.

But the Happens Before is NOT intrinsic to the cmpxchng it is a consequence of the RETRY attempt spin that, without it, it would fail and escape to your side (the side of the programmer) with no witness, just a failure.

And so, could we say that "strong" CAS-sing is contentious, because it spins until the BUS is freed(??).

If we analyze the implication that weakCompareAndSet may fail EVEN if the expected value matches, this would fit into this interpretation since, the instruction fails simply by encountering contention (a locked BUS), If the bus is locked it doesn't matter which state it's memory is or will be because it will not reattempt to acquire the lock, so if it's future memory state matches the expected value, it won't matter because what matters for the weak version is that it ENCOUNTERED CONTENTION.

And so, could we say that "weak" CASing is non-contentious, because it will return if the BUS is locked(??).

IMO the best evidence in favor of this argument is that EVERY part of the Java source code that uses the weakCompareAdnSet is ALWAYS done on superficial spinlock situations that DO NOT read a constant value, like a true, a false, an enum or constant... instead it is always done on a variable that is read via high-level volatile read, and which failure will always be followed up by a retry high-level spinlock.

This means that they ARE SURE that the value of the expected obtained via high-level volatile read, can surely be dismissed when contention is met, so they escape to the superficial level (programmer's side) as soon as possible and reattempt... instead of natively spinning to acquire a LOCK on a cmpxchg that will surely return a false on the comparison check.

Is my interpretation of this true? If you have some literature that I could read, it would be greatly appreciated, thanks.

Delark
  • 1,141
  • 2
  • 9
  • 15
  • 1
    x86 `lock cmpxchg` doesn't have spurious failures. It waits for exclusive ownership of the cache line. (It doesn't need to lock the whole bus, though; that was only true in much older CPUs). There's no difference between CAS_weak and CAS_strong on x86, only on ISAs like ARMv7, and ARMv8 before ARMv8.1, where CAS_weak can be one LL/SC (`ldrex` / `stlex`) without a retry loop which CAS_strong needs. https://en.wikipedia.org/wiki/Load-link/store-conditional – Peter Cordes Sep 02 '23 at 00:25
  • 1
    Related: [c++11: how to produce "spurious failures" upon compare\_exchange\_weak?](https://stackoverflow.com/q/72766332) / [std::atomic | compare\_exchange\_weak vs. compare\_exchange\_strong](https://stackoverflow.com/q/4944771) and [what's the difference between compareAndSet and weakCompareAndSet in AtomicReference?](https://stackoverflow.com/q/36428044) – Peter Cordes Sep 02 '23 at 00:38
  • Thanks @PeterCordes, this is not the first time I've been asking a variation of this question, which you've answered yourself previously, so thanks. it has been difficult for me to finally do a good question; I think I finally found the perfect way to express the doubts I had about this. IMO it is really sad that processors seem to b getting rid of CAS_weak, it seems that CAS_weak has ten times the utility that strong has,at least for all the code I've seen where programmers work with volatile updatable variables at least 80% of it would've been benefited from the weak version... if they knew – Delark Sep 02 '23 at 01:28
  • Weirdly enough, the only one giving a concise reason for the failure, is your [answer](http://https://stackoverflow.com/questions/72766332/c11-how-to-produce-spurious-failures-upon-compare-exchange-weak) which states: "not merely(failing) from competition for the cache line from other threads." which aling with my assumption that the strong implementation does an inner spin. Not even the LL/SC wikipedia page explicitly states the reason. – Delark Sep 02 '23 at 01:45
  • Also this other [answer](https://stackoverflow.com/questions/36428044/whats-the-difference-between-compareandset-and-weakcompareandset-in-atomicrefer) you linked which states: "a strong CAS implementation based on it has to loop internally". – Delark Sep 02 '23 at 01:45
  • 2
    *it seems that CAS_weak has ten times the utility that strong has* - huh, How do you mean? If you want to avoid dirtying the cache line on failure with a single-instruction CAS, check read-only with `.load(relaxed)` before trying the CAS at all. (e.g. in the retry path for a spinlock). [C optimization: conditional store to avoid dirtying a cache line](https://stackoverflow.com/q/22717010) and discussion in comments on [Does cmpxchg write destination cache line on failure? If not, is it better than xchg for spinlock?](https://stackoverflow.com/q/63008857) – Peter Cordes Sep 02 '23 at 02:57
  • 2
    It's not like hardware single-instruction CAS loops internally, though. It's normally implemented by getting the cache to delay responding to MESI requests to share or invalidate the line until after the check and store of the new value happen. This is very cheap. LL/SC just reports failure of the transaction because as separate instructions, the hardware can't let sofware deadlock the machine. But as a single instruction, the hardware knows the unlock is coming right away, so it can effectively make a transaction. – Peter Cordes Sep 02 '23 at 03:00

0 Answers0