I am writing a program to parse a file. It consists of a main loop that parses character by character and treats them. Here is the main loop:
char c;
char * ptr;
for( size_t i = 0; i < size ; ++i )
{
ptr = ( static_cast<char*>(sentenceMap) + i );
c = *ptr;
__builtin_prefetch( ptr + i + 1 );
// some treatment on ptr and c
}
As you can see, I added a builtin_prefetch
instruction, hoping to put in cache the next iteration of my loop. I tried with different values : ptr+i+1
, ptr+i+2
, ptr+i+10
but nothing seems to change.
To measure performance, I use valgrind’s tool cachegrind, which gives me an indication of the number of cache misses. On the line c = *ptr
, cachegrind records 632,378 DLmr (L3 cache miss) when __builtin_prefetch
is not set. What’s weird though, is that this value does not change, regardless of the parameter I set to __builtin_prefetch
.
Any explanation to that?