Why is cache coherency important in multi-processor system?

Question

Multiprocessor systems have some kind of cache coherency protocols built into them e.g. MSI, MESI etc. The only case where cache coherency matters is when instructions executing in two different processors tries to write/read shared data. For the shared data to be practically valid, programmer anyway has to introduce memory barriers. If there is no memory barrier, the shared data is going to be "wrong" regardless of whether underlying processor implements cache coherence or not. Why then the need of cache coherence mechanisms at hardware level?

Without cache coherency, memory-ordering barriers wouldn't be sufficient to make data visible between cores. Also, not true that barriers are needed. An atomic counter can work for some purposes with `std::memory_order_relaxed`, i.e. just atomicity, no ordering wrt. *other* operations. Perhaps you're misunderstanding exactly what barriers do: [Does a memory barrier ensure that the cache coherence has been completed?](https://stackoverflow.com/q/42746793). Also [When to use volatile with multi threading?](//stackoverflow.com/a/58535118) discuses coherence making hand-rolled C atomics work — Peter Cordes, Nov 20 '21 at 22:55
Not just shared data, also adjacent data in the same cache line. — root, Nov 21 '21 at 06:34
What I meant was that how do guarantees weaken (or program executes wrongly) when processor runs cache coherency only when memory barrier is encountered? And not running cache coherency protocols till the next memory barrier instruction. — driewguy, Nov 22 '21 at 21:01
Re your attempted answer: cache coherency is always maintained, not broken and restored after writes. So even *before* a write can become visible to other cores, the writing core needs to exclusive ownership of the cache line, in MESI-style systems with an RFO (read for ownership). That happens after the store *executes* and puts data into the (per-core-private) store buffer, but must complete before the store can commit from the SB to L1d cache. [Can a speculatively executed CPU branch contain opcodes that access RAM?](https://stackoverflow.com/q/64141366) — Peter Cordes, Dec 04 '21 at 22:03

score 1 · Answer 1 · answered Nov 20 '21 at 19:06

Without cache coherency, instead of merely barriers, you'd have to flush and invalidate caches when accessing shared data, which has a much higher overhead than cache coherency.

Historically, there have been a few shared memory multiprocessor architectures, but they have all died out in favor of CC due to being very difficult to program correctly and efficiently.

Why is cache coherency important in multi-processor system?

1 Answers1