0

In risc-v official doc 'Unprivileged Specification version 20191213', p31 says:

The FENCE.I instruction was designed to support a wide variety of implementations. A sim- ple implementation can flush the local instruction cache and the instruction pipeline when the FENCE.I is executed. A more complex implementation might snoop the instruction (data) cache on every data (instruction) cache miss

It seems to mean that data cache miss will search instruction cache (normally L1I cache). It seems to be conflict with instruction cache designed function although in general instruction and data are all binary data.


Then I found this Arm blog, it make sense of the situation that we snoop the data cache on instruction cache miss in JIT because the instruction are modified and corresponding instruction cache is invalidated.

Q1: Is there other conditions that we should snoop the data cache on the instruction cache miss?

I also found this Q&A comment. In modern CISC cpu (Although risc-v is MISC. Here CISC is said in this lwn article), L1I cache is indeed the trace cache (i.e. micro-op cache). So I guess maybe decoded micro-op has useful data which can be used in resolving data cache miss.

Q2: Is my guess right? Also, is there other conditions that we should snoop the instruction cache on the data cache miss?

zg c
  • 113
  • 1
  • 1
  • 7
  • Snooping makes fences cheaper. The less you track, the more it costs to guarantee that a fence gives its guarantees, e.g. worst case flush everything. In a related case on x86 (where I-cache coherence is required on paper only on jumps, real CPUs snoop always, because nuking the pipeline on every jump would be too expensive. ([Observing stale instruction fetching on x86 with self-modifying code](https://stackoverflow.com/a/18388700)) – Peter Cordes Jun 24 '23 at 14:36
  • Also, no, L1i isn't a decoded-uop cache; that's only useful on x86 where decode is expensive. And even then (https://www.realworldtech.com/sandy-bridge/4/), the uop cache is L0, separate from L1i, and it's not a trace cache, just a uop cache organized by address of the instructions they were decoded from, not threading jumps. (Except on Pentium 4, where L1i was a trace cache. It was bad. Low capacity made worse by being a trace cache which could cache the same instructions twice for different branching paths, and legacy decode was slow on L1i miss.) – Peter Cordes Jun 24 '23 at 14:39
  • Thanks. The above comments can answer why risc-v use snoop with fence, but why do risc-v snoop the **instruction** cache on every data cache miss instead of snooping the **data** cache? Maybe I didn't describe the question clearly. I have updated it. – zg c Jun 24 '23 at 15:59
  • If it didn't snoop the I-cache, it wouldn't know if there were dirty lines in the D-cache that were also valid in the I-cache. `fence.i` syncs instructions and data, right? Like after JIT storing some instructions into a buffer, run `fence.i` before jumping to that buffer? Or does the JIT also have to run instructions to force write-back of specific cache lines (to some unified level of cache) before that, like I think ARM does? I don't know RISC-V very well so I don't know what ISA requirements the HW has to meet, or what typical implementation strategies are. – Peter Cordes Jun 24 '23 at 18:51
  • Do you mean that it snoops the I-cache to catch the instruction like `STR Ra, [Rb, imm]` which will maybe change instruction and update D-cache? If so, I understand the problem. – zg c Jun 25 '23 at 02:46
  • So it will notice self-modifying code **more quickly** (the quick detection is similar to **bypass** in some way.). Otherwise it will snoop the D-cache which may stores the modified instruction **afterwards**. Then it may write-back by `dc cvau` when one specific instruction in I-cache has been invalidated by `ic ivau` (the latter two arm instructions are got from this [Q&A](https://stackoverflow.com/questions/70635862)). – zg c Jun 25 '23 at 02:51

0 Answers0