0

i'm new to OpenMp and wanted to use it in a larger project and have done so, but with no success. Each iteration of the parallel for-loop should compute a sequential cholesky decomposition of a matrix, but the timings were about 10 times slower with the parallel code.

Because of that i have written a small example code to get a better understanding of openmp. But somehow my openmp code is slower than the sequential code (just without the parallel pragma). Here is the simple code:

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int main ( )
{
clock_t start,ende;
double totalTime;

int i, n = 100000000;
double s = 1.23;
double *x;
double *y;
x = (double *) calloc(n, sizeof(double));
y = (double *) calloc(n, sizeof(double));

for ( i = 0; i < n; i++ ){
    x[i] = ( double ) ( ( i + 1 ) % 17 );
    y[i] = ( double ) ( ( i + 1 ) % 31 );
}

start = clock();

#pragma omp parallel for num_threads(4) private(i)
for ( i = 0; i < n; i++ ){
    x[i] = x[i] + s * y[i];
}

ende = clock();
totalTime = (ende - start)/(double)CLOCKS_PER_SEC;
printf("Zeit: %.10f s\n",totalTime);

free(x);
free(y);

return 0;
}

My times are 0.625s with the parallel code and 0.328s with the sequential code. As i go lower with numthreads() i get better times. 0.453s with numthreads(2) and 0.344s with numthreads(1).

Can someone help me with the small example code and why the cholesky decomposition doesn't work?

ops
  • 11
  • 1
  • `clock` returns CPU time, which is not wall clock time: it is (here) the combined time of the multiple cores. Try making your loop slower (e.g., add a `log` calculation or so for fun) so that whole process takes about 10 seconds or so (single thread), and compare your wall clock times when running on the command line. You'll find 4 threads *is* a lot faster, yet the passed time is still reported as being slower. –  Oct 09 '16 at 23:25
  • You haven't posted your sequential code. If your sequential code is as you say - identical with the exception of the OpenMP `#parallel ...` pragma - your posted OpenMP code is doing *more* than the sequential code. – Andrew Henle Oct 09 '16 at 23:25
  • In case the preceding explanations aren't sufficient, your compiler may auto-vectorize in the absence of omp directive but not when omp parallel for is set without the simd clause. This type of loop is likely to slow down when more than 1 thread per core is run (e.g. in presence of hyperthreading). You should divulge your affinity settings et al. – tim18 Oct 10 '16 at 00:03
  • @Evert Thank you very much! It really was the wrong fucntion to measure timings in a parallel code. With the other function mentioned in your linked thread the timings were indeed faster. – ops Oct 10 '16 at 17:38

0 Answers0