I have an O(n^3)
matrix multiplication function in C
.
void matrixMultiplication(int N, double **A, double **B, double **C, int threadCount) {
int i = 0, j = 0, k = 0, tid;
pragma omp parallel num_threads(4) shared(N, A, B, C, threadCount) private(i, j, k, tid) {
tid = omp_get_thread_num();
pragma omp for
for (i = 1; i < N; i++)
{
printf("Thread %d starting row %d\n", tid, i);
for (j = 0; j < N; j++)
{
for (k = 0; k < N; k++)
{
C[i][j] = C[i][j] + A[i][k] * B[k][j];
}
}
}
}
return;
}
I am using OpenMP
to parallelize this function by splitting up the multiplications. I am performing this computation on square matrices of size N = 3000
with a 1.8 GHz Intel Core i5
processor.
This processor has two physical cores and two virtual cores. I noticed the following performances for my computation
- 1 thread: 526.06s
- 2 threads: 264.531
- 3 threads: 285.195
- 4 threads: 279.914
I had expected my gains to continue until the setting the number of threads equal to four. However, this obviously did not occur.
Why did this happen? Is it because the performance of a core is equal to the sum of its physical and virtual cores?