The answer What are _mm_prefetch() locality hints? goes into details on what the hint means.
My question is: which one do I WANT?
I work on a function that is called repeatedly, billions of times, with some int
parameter among others. First thing I do is to look up some cached value using that parameter (its low 32 bits) as a key into 4GB cache. Based on the algorithm from where this function is called, I know that most often that key will be doubled (shifted left by 1 bit) from one call to the next, so I am doing:
int foo(int key) {
uint8_t value = cache[key];
_mm_prefetch((const char *)&cache[key * 2], _MM_HINT_T2);
// ...
The goal is to have this value
in a processor cache by the next call to this function.
I am looking for confirmation on my understanding of two points:
- The call to
_mm_prefetch
is not going to delay the processing of the instructions immediately following it. - There is no penalty for pre-fetching wrong location, just a lost benefit from guessing it right.
That function is using a lookup table of 128 128-bit values (2 KB total). Is there a way to “force” it to be cached? The index into that lookup table is incremented sequentially; should I pre-fetch them too? I should probably use another hint, to point to another level of cache? What is the best strategy here?