I just wrote a program to test the impact of CPU cache on speed performance.
void* foo(void* ptr)
{
int* r = (int*) ptr;
for (int i = (r - s); i < N; i += NUM_THREADS)
*r += num[i];
return NULL;
}
void* bar(void* ptr)
{
int* r = (int*) ptr;
int idx = r - s;
int block = N/NUM_THREADS;
int start = idx * block, end = start + block;
for (int i = start; i < end; ++i)
*r += num[i];
return NULL;
}
Basically, foo()
did an interlace scanning, on the other hand, bar()
scan the array block-by-block.
Test result indicates that bar()
is much faster:
gcc ping-pong.c -std=gnu99 -lpthread -O2 ; ./a.out
1.077037s
0.395525s
So how to interpretate this result?
The full source code is at: https://gist.github.com/4617935
Update: all if-statements removed