0

I want to test the role of Cache? I found that Cache didn't seem to be effective.

unsigned time1, time2;
int num=1;
int junk=0;
_mm_clflush(&num);
time1 = __rdtscp((unsigned int*)&junk);
num=2;
time2 = __rdtscp((unsigned int*)&junk) ;
printf("%u\n",(time2-time1));

The output of this code is 59.

unsigned time1, time2;
int num=1;
int junk=0;
//_mm_clflush(&num);
num=3;
time1 = __rdtscp((unsigned int*)&junk);
num=2;
time2 = __rdtscp((unsigned int*)&junk) ;
printf("%u\n",(time2-time1));

The output of this code is also 59. The num variable in the first code is not in the cache, but the second is in the cache. Why is the time the same?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Gerrie
  • 736
  • 3
  • 18
  • 1
    Did you disassemble and verify that the variables are indeed stored in RAM and not registers? And that the subsequent, superfluous writes to the same variable aren't optimized away? – Lundin Oct 12 '20 at 10:34
  • 2
    Also, *read* the value again. Why would cache speed up anything if you don't do any reading? – Marco Bonelli Oct 12 '20 at 10:51
  • 1
    Even if `num` was `volatile`, retiring a store instruction doesn't have to wait for it to commit to L1d cache (and thus doesn't have to wait for store misses.) [The store buffer](https://stackoverflow.com/questions/64141366/can-a-speculatively-executed-cpu-branch-contain-opcodes-that-access-ram) is doing its job, decoupling execution from store misses. https://en.wikipedia.org/wiki/MESI_protocol#Store_Buffer. – Peter Cordes Oct 12 '20 at 11:45
  • 1
    Note that 59 reference cycles indicates that you definitely didn't have to wait for DRAM, only RDTSCP overhead at whatever non-max clock speed, so yes the CPU was effective in hiding cache-miss latency. It's true that the cache played no part in this, though. – Peter Cordes Oct 12 '20 at 11:48

0 Answers0