I'm currently working my way through Computer Architecture: A Quantitative Approach by Hennessy and Patterson In Chapter 5 (Thread-Level Parallelism), they discuss cache coherence and replication for multiprocessing. They ask us to make the following assumptions by setting up a use case:
A few pages earlier in my textbook, they tell readers to make the following assumption:
- Processor A writes to memory location X
- Processor A writes to memory location Y.
- Processor C reading from memory location Y will see the correct value- this implies that Processor C will also see the correct value of memory location X.
The logical conclusion is that
These restrictions allow processors to reorder reads, but forces the processor to finish a write in program order.
However, a few paragraphs later, when discussing replication as a scheme for enforcing coherence, they say
Replication reduces both latency of access and contention for a read shared data item.
My interpretation is that replicating data to local caches allows multicore processors to reduce latency (because of data locality - the data is significantly closer to the processor). I agree with that portion. However, I'm unclear as to why there is contention for a read shared data item
. That seems to imply a RAR (Read after Read)
data hazard, which I know does not really exist.
Unless processors are attempting writes to a shared memory location, why would there be any sort of contention in reading a shared data item?
Edit: There's plenty of posts on StackOverflow about thread contention, including What is thread contention?. But these almost exclusive use locks as an example. My understanding is that locks are a higher-level application pattern for enforcing coherence. Moreover, all the examples I see as answers involve some sort of modification (write) of the target data item.