Intel hardware Prefetcher Intel website shows that there are four kinds of hardware prefechers. The prefetcher controlled by bit 3 is the L1 stride prefetcher. I am running a test code to test what's the trigger condition of the stride prefetcher. I run the code with following steps(set MSR0x1a4 to be 0x7, which means only enable the L1 IP-based strider prefetcher):
repeat following for 10000 times:
flush
training phase: access line 0 3 6 9
sleep for near 1000 cycles
measure phase: measure line 12
I expect to see line 12 to be prefetched into the cache. However I can see only the line 0 3 6 9 is hit in the cache. No stride prefetching activities can be observed even after I change the stride or the length of access pattern. So I wonder if anyone has seen prefetching activities in the Intel processor or there is some special trigger conditions that I don't notice?
Anyone who is interested in this case can have a try on test code.Just run sudo ./run.sh
is ok. The result on my machine show that access time for line 12 is bigger than 180 cycles mostly. I think there is no problem with time measurement code because if I change the measured line from cache line 12 to cache line 6(just change it at test.c, line 103), then the access time is mostly 25 cycles.