7

As per this question's answer, it seems that LOCK CMPXCHG on x86 actually causes a full barrier. Presumably, this is what Unsafe.compareAndSwapInt() generates under the hood as well. I am struggling to see why that is the case: with MESI protocol, after you updated the cache line, could the CPU simply invalidate just that cache line on other cores, rather than draining ALL store/load buffers of the core which performed CAS? Seems rather wasteful to me...

Bober02
  • 15,034
  • 31
  • 92
  • 178
  • With a full barrier, you would actually flush all your missed prediction changes, instead of one cache line, so wouldn't it be worse with the full barrier? But obviously I am missing sth here :) – Bober02 Jul 13 '17 at 12:49
  • [Compare-and-swap](https://en.wikipedia.org/wiki/Compare-and-swap) on Wikipedia covers this, *It compares the contents of a memory location to a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This is done as a single atomic operation. The atomicity guarantees that the new value is calculated based on up-to-date information; if the value had been updated by another thread in the meantime, the write would fail.* Without a full barrier it might be interrupted (or otherwise updated) and that could invalidate atomicitiy. – Elliott Frisch Jul 14 '17 at 02:27

1 Answers1

1

Your answer as far as I can see is in the comments - MESI updates caches, not Store/Load buffers. But lock LOCK CMPXCHG says: locked operations serialize all outstanding load and store operation - this is why it needs to drain the Store/Load buffer from this CPU (and not others as detailed here).

So the current CPU has to perform the atomic operation on the most recent value - that could reside in Store/Load buffers, that's why a fence is needed there to actually drain that.

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • I get ya! we need to gully sync up we swap the cache line, cool! One more thing - O the full mem barrier actually causes other CPU's store buffers to be flushed? Or only the one the thread is operating on? In other words, I was under the impression that pending changes from other CPU's are in my CPU's load buffer, so mem barrier just drains those – Bober02 Aug 09 '17 at 20:41
  • @Bober02 I've actually update the answer to make a bit more clear - as per my understanding. – Eugene Aug 10 '17 at 11:59