0

I have a two dimensional matrix that needs to be improved its performance with OpenMP

#pragma omp parallel private(i, j, k) num_threads(4)
{
    for(i=0; i<N; i++) {
        #pragma omp for schedule(dynamic, 2) // this line cannot be put before the first loop (i) cause of algorithm
        for(j=i+1; j<N; j++) {
            for(k=i+1; k<N; k++) {
                A[j][k] -= A[i][k]*A[j][i];
            }
        }
    }
}

I wonder that could i improve cache coherence or use task for a speedup?

Thanks !

Note: The above code is a part of LU Factorization Algorithm. I want to simplify my question. The following code is the real one that i used:

#pragma omp parallel private(i, j, k) num_threads(4)
{
    for(i=0; i<N; i++) {
        #pragma omp for schedule(dynamic, 2) nowait
        for(j=i+1; j<N; j++) {
            A[j][i] = A[j][i]/A[i][i];
        }

        #pragma omp barrier

        #pragma omp for schedule(dynamic, 2) nowait
        for(j=i+1; j<N; j++) {
            for(k=i+1; k<N; k++) {
                A[j][k] -= A[i][k]*A[j][i];
            }
        }

        #pragma omp barrier
    }
}
Trong Lam Phan
  • 2,292
  • 3
  • 24
  • 51
  • What kind of math operation is this? Is it related to Cholesky decomposition? – Z boson Oct 18 '14 at 22:55
  • It's LU Decomposition, in fact, the real algorithm doesn't run like that, but i wanna simplify my question. – Trong Lam Phan Oct 18 '14 at 23:50
  • 1
    Have you seen this http://stackoverflow.com/questions/22479258/cholesky-decomposition-with-openmp/23063655#23063655? Yes you can improve cache coherence. You should use loop tiling. It's very difficult to get the maximum efficiency. If you want to know how see http://www.openblas.net/ – Z boson Oct 19 '14 at 09:15
  • You're right. It's 1.5 times faster than the old one. Thanks ! – Trong Lam Phan Oct 19 '14 at 20:04
  • Aside from the better fixes above, the two explicit barriers are unnecessary, since there's an implicit barrier at the end of each omp for. – Jim Cownie Oct 20 '14 at 11:42
  • Because of nowait, i don't know why but when i use nowait, it is faster. – Trong Lam Phan Oct 20 '14 at 15:17

0 Answers0