I am trying to understand cache thrashing, is the following text correct?
Taking the code below to be an example.
long max = 1024*1024;
long a(max), b(max), c(max), d(max), e(max);
for(i = 1; i < max; i++)
a(i) = b(i)*c(i) + d(i)*e(i);
The ARM Cortex A9 is four way set associative and each cache line is 32 bytes, total cache is 32kb. In total there are 1024 cache lines. In order to carry out the above calculation one cache line must be displaced. When a(i) is to be calculated, b(i) will be thrown out. Then as the loop iterates, b(i) is needed and so another vector is displaced. In the example above, there is no cache reuse.
To solve this problem, you can introduce padding between the vectors in order to space out their beginning address. Ideally, each padding should be at least the size of a full cache line.
The above problem can be solved as such
long a(max), pad1(256), b(max), pad2(256), c(max), pad3(256), d(max), pad4(256), e(max)
For multidimensional arrays, it is enough to make the leading dimension an odd number.
Any help if the above is true or where I have made an error.
Thanks.