Overhead for single threaded OpenMP vs sequential

Question

I'm playing around with OpenMP and I've stumbled upon something I don't understand. I'm using the following parallell code (which works correctly). Its execution time almost halves when using twice more threads. However, the execution time using OpenMP with one thread is 35 seconds, while when I comment the pragmas it decreases to 25sec! Is there something I can do to decrease this huge overhead? I'm using gcc 4.8.1 and compiling with "-O2 -Wall -fopenmp".

I read similar topics (OpenMP with 1 thread slower than sequential version, OpenMP overhead) - the opinions differ from no overhead to a lot of overhead. I'm curious is there a better way to use OpenMP in my particular case (for loop and inside a parallell for).

for (size_t k = 0 k < maxk; ++k) { // k is ~5000
    // init reduction variables
    const bool is_time_for_reduction = ;// init from k
    double mmin = INFINITY, mmax = -INFINITY;
    double sum = 0.0;


    #pragma omp parallel shared(m1, m2)
    {
       // w, h are both  between 1000 and 2000
        #pragma omp for
        for (size_t i = 0; i < h; ++i) { // w,h - consts
            for (size_t j = 0; j < w; ++j) {
                // computations with matrices m1 and m2, using only m1,m2 and constants w,h
            }
        }

        if (is_time_for_reduction) {
            #pragma omp for reduction (max/min/sum: mmax,mmin,sum)
            for (size_t i = 0; i < h; ++i) {
                for (size_t j = 0; j < w; ++j) {
                    // reductions
                }
            }
        }
    }


    if (is_time_for_reduction) {
        // use "reduced" variables
    }
}

Did you try the code in my answer with and without pragma? The compiler may optimize it differently than your version. — Z boson, Nov 14 '14 at 09:13

score 1 · Answer 1 · answered Nov 12 '14 at 19:18

I don't see a reason to change your original sequential code. I would try this:

for (size_t k = 0 k < maxk; ++k) {
    // init reduction variables
    const bool is_time_for_reduction = ;// init from k
    double mmin = INFINITY, mmax = -INFINITY;
    double sum = 0.0;

    #pragma omp parallel for
    for (size_t i = 0; i < h; ++i) { // w,h - consts
        for (size_t j = 0; j < w; ++j) {
            // computations with matrices m1 and m2, using only m1,m2 and constants w,h
        }
    }

    if (is_time_for_reduction) {
       #pragma omp parallel for reduction (max/min/sum: mmax,mmin,sum)
       for (size_t i = 0; i < h; ++i) {
           for (size_t j = 0; j < w; ++j) {
               // reductions
           }
       }
       // use "reduced" variables
    }
}

Overhead for single threaded OpenMP vs sequential

1 Answers1