I'm trying to write Matrix by vector multiplication in C (OpenMP) but my program slows when I add processors...
1 proc - 1,3 s
2 proc - 2,6 s
4 proc - 5,47 s
I tested this on my PC (core i5) and our school's cluster and the result is the same (program slows)
here is my code (matrix is 10000 x 10000) and vector is 10000:
double start_time = clock();
#pragma omp parallel private(i) num_threads(4)
{
tid = omp_get_thread_num();
world_size = omp_get_num_threads();
printf("Threads: %d\n",world_size);
for(y = 0; y < matrix_size ; y++){
#pragma omp parallel for private(i) shared(results, vector, matrix)
for(i = 0; i < matrix_size; i++){
results[y] = results[y] + vector[i]*matrix[i][y];
}
}
}
double end_time = clock();
double result_time = (end_time - start_time) / CLOCKS_PER_SEC;
printf("Time: %f\n", result_time);
My question is: is there any mistake? For me it seems pretty straightforward and should speed up