I am trying to understand CPU caches and cache lines using a C program, as I do for most C concepts. The program that I use is given below. I got the idea from the blog.
http://igoro.com/archive/gallery-of-processor-cache-effects/
Now the output of the below program on my machine is as given below. This is the output with CFLAGS="-g -O0 -Wall".
./cache
CPU time for loop 1 0.460000 secs.
CPU time for loop 2 (j = 8) 0.050000 secs.
CPU time for loop 2 (j = 9) 0.050000 secs.
CPU time for loop 2 (j = 10) 0.050000 secs.
CPU time for loop 2 (j = 11) 0.050000 secs.
CPU time for loop 2 (j = 12) 0.040000 secs.
CPU time for loop 2 (j = 13) 0.050000 secs.
CPU time for loop 2 (j = 14) 0.050000 secs.
CPU time for loop 2 (j = 15) 0.040000 secs.
CPU time for loop 2 (j = 16) 0.050000 secs.
CPU time for loop 2 (j = 17) 0.040000 secs.
CPU time for loop 2 (j = 18) 0.050000 secs.
CPU time for loop 2 (j = 19) 0.040000 secs.
CPU time for loop 2 (j = 20) 0.040000 secs.
CPU time for loop 2 (j = 21) 0.040000 secs.
CPU time for loop 2 (j = 22) 0.040000 secs.
CPU time for loop 2 (j = 23) 0.040000 secs.
CPU time for loop 2 (j = 24) 0.030000 secs.
CPU time for loop 2 (j = 25) 0.040000 secs.
CPU time for loop 2 (j = 26) 0.030000 secs.
CPU time for loop 2 (j = 27) 0.040000 secs.
CPU time for loop 2 (j = 28) 0.030000 secs.
CPU time for loop 2 (j = 29) 0.040000 secs.
CPU time for loop 2 (j = 30) 0.030000 secs.
CPU time for loop 2 (j = 31) 0.030000 secs.
The output with optimization (CFLAGS=-g -O3 -Wall
)
CPU time for loop 1 0.130000 secs.
CPU time for loop 2 (j = 8) 0.040000 secs.
CPU time for loop 2 (j = 9) 0.050000 secs.
CPU time for loop 2 (j = 10) 0.050000 secs.
CPU time for loop 2 (j = 11) 0.040000 secs.
CPU time for loop 2 (j = 12) 0.040000 secs.
CPU time for loop 2 (j = 13) 0.050000 secs.
CPU time for loop 2 (j = 14) 0.050000 secs.
CPU time for loop 2 (j = 15) 0.040000 secs.
CPU time for loop 2 (j = 16) 0.040000 secs.
CPU time for loop 2 (j = 17) 0.050000 secs.
CPU time for loop 2 (j = 18) 0.040000 secs.
CPU time for loop 2 (j = 19) 0.050000 secs.
CPU time for loop 2 (j = 20) 0.040000 secs.
CPU time for loop 2 (j = 21) 0.040000 secs.
CPU time for loop 2 (j = 22) 0.040000 secs.
CPU time for loop 2 (j = 23) 0.030000 secs.
CPU time for loop 2 (j = 24) 0.040000 secs.
CPU time for loop 2 (j = 25) 0.030000 secs.
CPU time for loop 2 (j = 26) 0.040000 secs.
CPU time for loop 2 (j = 27) 0.030000 secs.
CPU time for loop 2 (j = 28) 0.030000 secs.
CPU time for loop 2 (j = 29) 0.030000 secs.
CPU time for loop 2 (j = 30) 0.030000 secs.
CPU time for loop 2 (j = 31) 0.030000 secs.
It is pointed out in the blog that
The first loop multiplies every value in the array by 3, and the second loop multiplies > only every 16-th. The second loop only does about 6% of the work of the first loop, but on modern machines, the two for-loops take about the same time: 80 and 78 ms respectively on my machine.
This doesn't seem to be the case on my machine. As you can see, the time for the execution of
loop 1 is 0.46 seconds.
and that for
loop 2 is 0.03 seconds or 0.04 seconds or 0.05 seconds
for different values of j.
Why is this happening?
#include <stdio.h>
#include <sys/time.h>
#include <time.h>
#include <unistd.h>
#include <stdlib.h>
#define MAX_SIZE (64*1024*1024)
int main()
{
clock_t start, end;
double cpu_time;
int i = 0;
int j = 0;
/* MAX_SIZE array is too big for stack. This is an unfortunate rough edge of the way the stack works.
It lives in a fixed-size buffer, set by the program executable's configuration according to the
operating system, but its actual size is seldom checked against the available space. */
/* int arr[MAX_SIZE]; */
int *arr = (int*)malloc(MAX_SIZE * sizeof(int));
/* CPU clock ticks count start */
start = clock();
/* Loop 1 */
for (i = 0; i < MAX_SIZE; i++)
arr[i] *= 3;
/* CPU clock ticks count stop */
end = clock();
cpu_time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("CPU time for loop 1 %.6f secs.\n", cpu_time);
for (j = 8 ; j < 32 ; j++)
{
/* CPU clock ticks count start */
start = clock();
/* Loop 2 */
for (i = 0; i < MAX_SIZE; i += j)
arr[i] *= 3;
/* CPU clock ticks count stop*/
end = clock();
cpu_time = ((double) (end - start)) / CLOCKS_PER_SEC;
printf("CPU time for loop 2 (j = %d) %.6f secs.\n", j, cpu_time);
}
return 0;
}