On an Intel CPU, I want CPU core A to signal CPU core B when A has completed an event. There are a couple ways to do this:
- A sends an interrupt to B.
- A writes a cache line (e.g., a bit flip) and B polls the cache line.
I want B to learn about the event with the least amount of overhead possible. Note that I am referring to overhead, not end-to-end latency. It's alright if B takes a while to learn about the event (e.g., periodic polling is fine), but B should waste as few cycles as possible detecting the event.
Option 1 above has too much overhead due to the interrupt handler. Option 2 is better, but I am still unhappy with the amount of time that B must wait for the cache line to transfer from A's L1 cache to its own L1 cache.
Is there some way A can directly push the cache line into B's L1 cache? It's fine if there is additional overhead for A in this case. I'm not sure if there some trick I can try where A marks the page as uncacheable and B marks the page as write-back...
Alternatively, is there some other mechanism built into Intel processors that can help with this?
I assume this is less of an issue on AMD CPUs as they use the MOESI coherence protocol, so the "O" should presumably allow A to broadcast the cache line changes to B.