0

https://itnext.io/understanding-the-lmax-disruptor-caaaa2721496 I was going thru this article... i have a trouble in understanding this paragraph :

In our example, because sharedCounter is updated by two CPU cores at the same time, it will bounce the variable between the two local L1 caches which will slow down drastically the execution.

bouncing the variable between two local L1 caches makes it slower ? how ?

dopller
  • 291
  • 3
  • 13
  • 1
    Related: [What are the latency and throughput costs of producer-consumer sharing of a memory location between hyper-siblings versus non-hyper siblings?](https://stackoverflow.com/q/45602699) / [What happens when different CPU cores write to the same RAM address without synchronization?](https://stackoverflow.com/q/48817022) / [How are cache memories shared in multicore Intel CPUs?](https://stackoverflow.com/q/944966) – Peter Cordes Jul 02 '22 at 01:28
  • 1
    Also [Why is L1 write access access worst with two threads in the same core (hyperthreading) than two cores?](https://stackoverflow.com/q/51965481) (But that's write-only access, not atomic RMW increments) – Peter Cordes Jul 02 '22 at 01:29

1 Answers1

1

Coherent L1 caches are implemented with some version or variant of the MESI protocol. In this protocol, a given core has to "own" a cache line before being able to write to it. If the cache line is already owned by another core, there is a mechanism to request that ownership be transferred and the updated data copied over, but this takes time.

If two cores are frequently writing to the same variable (or to two different variables in the same cache line, in the case of false sharing), then most of the writes will need such a transfer, slowing everything down. But if only one core needs to write it, the transfer is not required, so it is faster.

A comprehensive writeup of this and other memory-related topics is Ulrich Drepper's long article "What every programmer should know about memory" (HTML PDF).

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82