I've noticed an interesting behavior in the code of this question which is also comes from Agner Fog in Optimizing software in C++ and it reduces to how data is accessed and stored in the cache (cache associativity). The explanations is clear for me, but then someone pings about volatile
...
That is if we add volatile
qualifier to the matrix declaration: volatile int mat[MATSIZE][MATSIZE];
the running time for value 512
dramatically decreases: 2144 → 1562 μs.
As we know volatile
prevents compilers from caching the value (in a CPU register) and from optimizing away accesses to that value when they seem unnecessary from the POV of a program.
One possible version assumes that the computation process happens only in RAM and no cpu caches is used in the case of volatile
. But on the other hand the run-time for value 513
again is less than for 512
: 1490 μs
...