The gather prefetch intrinsic _mm512_mask_prefetch_i32gather_ps
can be used to prefetch 32 bit floats on Knights Corner.
Since a corresponding intrinsic for doubles does not exist, how should this intrinsic be used for prefetch 64 or 128 bit elements?
Does each 4 byte chunk needed to be explicitly prefetched, or can we assume that each prefetch of a 32 bit variable will actually prefetch the entire 64 byte cache line that it occupies?
Example:
I want to prefetch 4 doubles at offsets {1,2,10,12}
from base address 0xf0000000
.
This corresponds to addresses of {0xf0000008, 0xf0000010, 0xf0000050, 0xf0000060}
.
These occupy two cache lines starting at {0xf0000000, 0xf0000040}
.
Would it be sufficient to use _mm512_mask_prefetch_i32gather_ps
with the base addresses of these two cache lines?
I originally posted this question on the Intel MIC forum without success.