I have just read a blogpost here and try to do a similar thing, here is my code to check what is in example 1 and 2:
int doSomething(long numLoop,int cacheSize){
long k;
int arr[1000000];
for(k=0;k<numLoop;k++){
int i;
for (i = 0; i < 1000000; i+=cacheSize) arr[i] = arr[i];
}
}
As stated in the blogpost, the execution time for doSomething(1000,2) and doSomething(1000,1) should be almost the same, but I got 2.1s and 4.3s respectively. Can anyone help me explain? Thank you.
Update 1: I have just increased the size of my array to 100 times larger
int doSomething(long numLoop,int cacheSize){
long k;
int * buffer;
buffer = (int*) malloc (100000000 * sizeof(int));
for(k=0;k<numLoop;k++){
int i;
for (i = 0; i < 100000000; i+=cacheSize) buffer[i] = buffer[i];
}
}
Unfortunately, the execution time of doSomething(10,2) and doSomething(10,1) are still much different: 3.02s and 5.65s. Can anyone test this on your machine?