I have a kernel which makes use of four different memories:
- memory A (around 2MB) is used only once (for loading)
- memory B (around 2MB) is used only once (for storing)
- memory C (32KB) and D (32KB) are read-only and is accessed hundreds of times
Memory A and B access coalesce but C and D accesses do not (but not random; a huge contiguous chunk of A/B use the same 32 bytes of C/D).
Memory A and B have a one-to-one correspondence, i.e. every 16 bytes of memory B is modified 16 bytes of memory A.
Stores are presumably not cached and would not evict/invalidate existing cache lines if all the pointers are marked __restrict
. No problems with memory B.
Memory C and D can be loaded using __ldg
since it is used frequently and is not mutable.
Memory A is used only once. Hence, there is simply no point in caching the loads from A. Unfortunately, by default, they are cached in L2. This might cause the useful cache lines containing C and D to be evicted.
How do I inform the compiler that I do not want loads from memory A to be cached?