I just made some benchmarks for this super question/answer Why is my program slow when looping over exactly 8192 elements?
I want to do benchmark on one core so the program is single threaded. But it doesn't reach 100% usage of one core, it uses 60% at most. So my tests are not acurate.
I'm using Qt Creator, compiling using MinGW release mode.
Are there any parameters to setup for better performance ? Is it normal that I can't leverage CPU power ? Is it Qt related ? Is there some interruptions or something preventing code to run at 100%...
Here is the main loop
// horizontal sums for first two lines
for(i=1;i<SIZE*2;i++){
hsumPointer[i]=imgPointer[i-1]+imgPointer[i]+imgPointer[i+1];
}
// rest of the computation
for(;i<totalSize;i++){
// compute horizontal sum for next line
hsumPointer[i]=imgPointer[i-1]+imgPointer[i]+imgPointer[i+1];
// final result
resPointer[i-SIZE]=(hsumPointer[i-SIZE-SIZE]+hsumPointer[i-SIZE]+hsumPointer[i])/9;
}
This is run 10 times on an array of SIZE*SIZE float with SIZE=8193, the array is on the heap.