I need help creating a minimal example showing openMP speed-up on my system

Question

So, in the aftermath of my previous attempt with openMP, I have realized that I don't have any example of a piece of code which actually runs faster on my system when parallelized compared to serial. Below is a short example of an attempt (which fails), first showing that there are indeed two cores, and that openMP is putting them use, and then timing two brain-dead tasks, one using openMP and the other not. It is very possible that there is something wrong about the task I'm testing on, so I'd appreciate it if someone could come up with another sanity-test, just so I can see with my own eyes that multi-threading CAN work :)

#include <iostream>
#include <vector>
#include <ctime>
#include <cmath>

using namespace std;

#include <omp.h>

int main(int argc, char *argv[])
{


    //Below code will be run once for each processor (there are two)
    #pragma omp parallel 
    {
        cout << omp_get_thread_num() << endl; //this should output 1 and 0, in random order
    }


    //The parallel example:
    vector <double> a(50000,0);

    clock_t start = clock();
#pragma omp parallel for  shared(a) 
    for (int i=0; i < 50000; i++)    
    {
        double StartVal=i;

        for (int j=0; j<2000; ++j)
            a[i]=(StartVal + log(exp(exp((double) i)))); 
    } 

    cout<< "Time: " << ( (double) ( clock() - start ) / (double)CLOCKS_PER_SEC ) <<endl;

    //The serial example:
    start = clock();

    for (int i=0; i < 50000; i++)    
    {
        double StartVal=i;

        for (int j=0; j<2000; ++j)
            a[i]=(StartVal + log(exp(exp((double) i)))); 
    } 

    cout<< "Time: " << ( (double) ( clock() - start ) / (double)CLOCKS_PER_SEC ) <<endl;

    return 0;
}

the output is:

    1
    0
    Time: 4.07
    Time: 3.84

Can it be something to do with forloop-optimization that openMP is missing out on? Or is there something wrong with how I measure time? In that case, do you have any ideas for a different test?

Thank you in advance :)

EDIT: it did it indeed turn out that I was measuring time in a bad way. Using omp_get_wtime(), the output becomes:

1
0
Time: 4.40776
Time: 7.77676

I guess I'd better go back and have another look at my old question then...

score 2 · Accepted Answer · edited May 23 '17 at 11:56

There are two possibilities I can think of:

If you are running on Linux. clock() doesn't measure wall time on Linux. It measures CPU time.
I suggest you use omp_get_wtime() instead.
Your test isn't large enough. Try increasing 2000 to something like 200000.

Here's what I get on Windows using 200000 iterations on the inner loop:

4
5
2
3
1
6
7
0
Time: 1.834
Time: 6.792

My answer for this question has a very simple OpenMP example that achieves speedup.

score 0 · Answer 2 · answered Feb 24 '12 at 20:34

As Mystical said, it's perhaps just how you measure. In linux you can use clock_gettime. Using a linux virtualbox with 4 cores (I have 6 physical) I get about a factor of 3 speed up for your code even with inner loops as small as j<20.

#include <sys/time.h>

int main( ... ) ... same as your code ...    
   timespec ts1;
   timespec ts2;

   //start measurement:
   clock_gettime(CLOCK_REALTIME, &ts1);

   ... code to time here ...

   //stop measurement:

   clock_gettime(CLOCK_REALTIME, &ts2);

   cout<< "clock Time s: " << (ts2.tv_sec-ts1.tv_sec) + 1e-9*( ts2.tv_nsec-ts1.tv_nsec ) <<endl;

  ... }

I need help creating a minimal example showing openMP speed-up on my system

2 Answers2