I'm benchmarking software which executes 4x faster on Intel 2670QM then my serial version using all 8 of my 'logical' threads. I would like some community feedback on my perception of the benchmarking results.
When I am using 4 Threads on 4 cores I get a speed up of 4x, the entire algorithm is executed in parallell. This seems logical to me since 'Amdhals law' predicts it. Windows task manager tells me I'm using 50% of the CPU.
However if I execute the same software on all 8 threads, I get, once again a speed up of 4x and not a speed up of 8x.
If I have understood this correctly: my CPU has 4 cores with a Frequency of 2.2GHZ individually but the Frequency is divided into 1.1GHZ when applied to 8 'logical' threads and the same follows for the rest of the component such as the cache memory? If this is true then why does the task manager claim only 50% of my CPU is being used?
#define NumberOfFiles 8
...
char startLetter ='a';
#pragma omp parallel for shared(startLetter)
for(int f=0; f<NumberOfFiles; f++){
...
}
I am not including the time using disk I/O. I am only interested in the time a STL call takes(STL sort) not the disk I/O.