basic openmp program runs slower

Question

i am trying to make my program run faster so, i will use parallel computing. Before that, i tried on simple for loop but it runs slower.

before open mp :

int a[100000] = { 0 };
clock_t begin = clock();

for (int i = 0; i < 100000; i++)
{
    a[i] = i;
}


clock_t end = clock();
double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
printf("%lf", elapsed_secs);

After open mp :

int a[100000] = { 0 };
clock_t begin = clock();


#pragma omp parallel for
for (int i = 0; i < 100000; i++)
{
    a[i] = i;
}


clock_t end = clock();
double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
printf("%lf", elapsed_secs);

What were your results? Did you run it multiple times to get an average? — NathanOliver, Nov 20 '15 at 13:57
Sadly, OMP does not magically make everything faster. Now for your example: Do you use the values you write afterwards? Otherwise, the whole loop was probably optimized away and you are measuring non-sense. If this is not the problem, make the loop bigger. — Baum mit Augen, Nov 20 '15 at 14:11
Don't use `clock`. See http://stackoverflow.com/questions/10673732/openmp-time-and-clock-calculates-two-different-results and many other Qs and As here on SO. — High Performance Mark, Nov 20 '15 at 15:58

score 4 · Answer 1 · edited May 23 '17 at 12:30

You say that your code runs slower, but you actually don't know that. The reason is that you use clock() for measuring the time, and this function counts the CPU time of the current threads and possibly the one of all threads it spawns. For evaluating speed-ups, what you need to measure is the elapsed wall clock time. And for this purpose, OpenMP offers you omp_get_wtime(). Try using it on your code and then you'll really know whether or not your code gets any sort of benefit from OpenMP.

Now, let's be clear, your code does nothing more than writing in memory. So there is a strong likelihood that you'll saturate your memory bandwidth pretty quickly. Therefore, unless you have multiple memory controllers, it is unlikely you gain much from adding threads in this case. Please have a look at this answer to convince yourself.

And finally, make sure you do something with your data before to exit the code, otherwise, the compiler is likely to just optimise it out, leading to a code doing pretty-much nothing (but doing it very fast).

zam · Accepted Answer · 2015-11-20T21:17:24.140

To be succesfull with your first OpenMP parallel (multi-threaded) code examples you need to improve your test cases from following two perspectives:

Make your examples testable. To do that:
- make sure that your code is complex enough to not give compilers any chance to "optimize" the whole loop out (i.e. to prevent compilers from kinda replacing the whole loop with single expression)
- you may end up with need to introduce function wrapping your loop and pass argument to this function in runtime (via argc/argv) to make compiler confused, while keeping the code very simple
- make sure you use proper compilation flags (-O2 -fopenmp for GCC, some other flags for other compilers)
- make sure your loop takes enough time and that you use proper way to measure time spent in the loop (other respondents, including Gilles, have alrady pointed it out very well)
Make sure that your loop is doing enough (ideally computational) work (i.e. additions, multiplications, etc) in every loop iteration, so that various overheads associated with doing some under-the-hood work inside of OpenMP runtime library (required to "schedule"/plan/distribute iterations between threads) are not "bigger" than amount of useful work done in bunch of your loop iterations. Second and Third wikipedia OpenMP' parallel for examples are already good enough to mostly satisfy given criteria (while your example is not satisfying it). You are at the point where just following wikipedia examples will help you to gain some basic understanding.

After you learn given basics, your next steps would be (a) understanding "Data Races" / "Race Conditions" / "Loop Carried Dependencies" and (b) understanding the "difference" between #pragma omp parallel and #pragma omp for (again, you will need to find simple examples from books or basic OpenMP courses).

(to be honest, all other topics, like OpenMP imbalance, dynamic vs. static, Memory Bandwidth, will make sense only after you spend at least couple days of reading/practicing with simpler notions)

basic openmp program runs slower

2 Answers2