0

Assume this scenario:

We are going to patch a piece of instruction in executable memory while it's running in another thread, here a sample code in C:

Instruction* mem; // pointer to executable memory.
int offset;

void patch(Instruction new_insn){
   mem[offset] = new_insn;
}

Here is what could happen:

1- Thread 1 loads a cache line sized section of memory into Instruction Cache to execute.

2- Thread 2 will patch one instruction in the same address space using the code above, the new instruction first get stored in Data Cache before being written to memory.

3- Now the instruction in Instruction Cache is invalid as it's outdated. Does the data cache communicate with the instruction cache to invalidate its data so it refetches the data? does the data get copied from one cache to the other, OR the instruction cache need to refetch from memory?

Dan
  • 2,694
  • 1
  • 6
  • 19
  • 1
    In x86 asm yes, I-cache is coherent. [Observing stale instruction fetching on x86 with self-modifying code](https://stackoverflow.com/q/17395557). But in C, no, you need to tell the compiler about code stores so it doesn't optimize them away, e.g. with GNU C `__builtin___clear_cache`. On x86 it doesn't actually invalidate any cache, just makes sure the stores aren't treated as "dead" and optimized away. [How to get c code to execute hex machine code?](https://stackoverflow.com/q/9960721) – Peter Cordes Oct 09 '21 at 18:45
  • I see, the suggested post above has a line saying `the write to an address that's in one of the cache lines in the instruction cache invalidates it from the instruction cache. No "synchronization" is involved.`, so this means data is never coped from cache to cache? it always gets invalidated so it refreshes from memory? or it could get copied on select architectures? (it's hw specific). – Dan Oct 09 '21 at 18:49
  • 1
    L2 and L3 caches are unified, so probably a read miss by L1i for a dirty line will trigger an L1d write-back by some mechanism. Possibly out to L3, but I would be shocked if DRAM was involved. The more significant cost is that you'll likely get a pipeline nuke on the CPU core that had some instructions invalidated, if any were in flight in the pipeline. (perf event `machine_clears.smc`) – Peter Cordes Oct 09 '21 at 18:54
  • Sorry I was referring to Instruction cache and Data cache, not L1/L2. Does the data between ic and dc ever get copied directly? – Dan Oct 09 '21 at 18:56
  • 1
    Like I said, no L1d->L1i transfer AFAIK. L1i is the L1 instruction cache, L1d is the L1 data cache. L2 is the unified cache that both of those fetch through. See [L1 caches usually have split design, but L2, L3 caches have unified design, why?](//stackoverflow.com/q/64184265) / [What does a 'Split' cache means. And how is it useful(if it is)?](https://stackoverflow.com/q/55752699) if you weren't familiar with this part of cache hierarchies. Also https://www.realworldtech.com/sandy-bridge/. (Also see my edit to previous comment re: machine clears on cross-modifying code.) – Peter Cordes Oct 09 '21 at 18:58
  • I see so it could go all the way down to L3 but most likely not all they back to RAM, Thanks a lot for clarifying. – Dan Oct 09 '21 at 19:00
  • 1
    Yeah, cache coherence mechanisms avoid having to go to RAM. See also [Which cache mapping technique is used in intel core i7 processor?](https://stackoverflow.com/q/49092541) – Peter Cordes Oct 09 '21 at 19:01

0 Answers0