i'm new to OpenMp and wanted to use it in a larger project and have done so, but with no success. Each iteration of the parallel for-loop should compute a sequential cholesky decomposition of a matrix, but the timings were about 10 times slower with the parallel code.
Because of that i have written a small example code to get a better understanding of openmp. But somehow my openmp code is slower than the sequential code (just without the parallel pragma). Here is the simple code:
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main ( )
{
clock_t start,ende;
double totalTime;
int i, n = 100000000;
double s = 1.23;
double *x;
double *y;
x = (double *) calloc(n, sizeof(double));
y = (double *) calloc(n, sizeof(double));
for ( i = 0; i < n; i++ ){
x[i] = ( double ) ( ( i + 1 ) % 17 );
y[i] = ( double ) ( ( i + 1 ) % 31 );
}
start = clock();
#pragma omp parallel for num_threads(4) private(i)
for ( i = 0; i < n; i++ ){
x[i] = x[i] + s * y[i];
}
ende = clock();
totalTime = (ende - start)/(double)CLOCKS_PER_SEC;
printf("Zeit: %.10f s\n",totalTime);
free(x);
free(y);
return 0;
}
My times are 0.625s with the parallel code and 0.328s with the sequential code. As i go lower with numthreads() i get better times. 0.453s with numthreads(2) and 0.344s with numthreads(1).
Can someone help me with the small example code and why the cholesky decomposition doesn't work?