2

I have parallelized image convolution and lu factorization using OpenMP and Intel TBB. I am testing it on 1-8 cores. But when I try it on 1 core in OPenMP and TBB by specifying one thread using set_num_threads(1), and task_scheduler_init InitTBB(1) respectively for example; TBB performance shows some small degradation compared to sequential code due to TBB overhead, but surprisingly OpenMP doesnt show any overhead on single core and performs exactly equal to sequential code (using Intel O3 optimization level). I am using static scheduling of OpenMP loops. Is it realistic or am I doing some mistake ?

Akhtar Ali
  • 269
  • 2
  • 4
  • 8
  • does this also happen if the argument `num_threads` to `set_num_threads(num_threads)` is only known at run time, ie from user input? – Walter Mar 20 '13 at 18:33

4 Answers4

2

The OpenMP runtime will probably not create any threads if you run it with just one thread.

Also, just using OpenMP parallelization directives sometimes makes also serial code run faster as you are essentially giving the compiler more information. A work-sharing construct, for example, tells the compiler that the iterations of the loop are independent of each other, which it might not have been able to deduce on its own and which allows the compiler to use more aggressive optimization strategies. Not always, of course, but I have seen it happen with "real world code".

cschleiden
  • 171
  • 1
  • 5
  • 1
    "OpenMP parallelization directives sometimes makes also serial code run faster as you are essentially giving the compiler more information" - this is interesting. I have unfortunately seen slight performance degradation when I use 1 thread with openmp pragma and have been since using #ifdefs to prevent it if only 1 thread is there. Did you observe this w.r.t a specific compiler/code combination? – Sayan Mar 19 '12 at 19:54
  • Wow, never heard of that, too. Have to try that. Interesting! – DirkMausF Jun 14 '12 at 11:28
0

OpenMP forks a decorated part (#pragma omg for/parallel) of the code into a main thread (that would also be executed without OpenMP) and additional threads.

If you configure to only use 1 thread, then this is only the main thread, executed as it would be without the OpenMP directive. There is no overhead, cause the execution path wasn't forked.

DirkMausF
  • 715
  • 5
  • 14
0

The thing about OpenMP is that the compiler does the work for you, it requires minimum modification to the sequential code and often give somewhat good results if the tasks given to each thread are quite large. I would suggest to try to test your code using Pthread or thread on c++11 and see the results.

Anas
  • 359
  • 1
  • 5
  • 14
0

OpenMP is something where the compiler does all the work. If the compiler knows it's going to be serial code always it can quite legitimately skip all of the parallel bits.

TBB as I understand it is basically just a library. It is always going to have to have your algorithm decorated with the necessary parts to run it in parallel as well as serially.

Flexo
  • 87,323
  • 22
  • 191
  • 272
  • So do you mean to say that if I set single thread in OpenMP, its implementation is intelligent enough to skip OpenMP pragmas and run it serially? – Akhtar Ali Sep 05 '11 at 13:22
  • If you set it at compile time which from your question I think you have then yes it is possible and likely. – Flexo Sep 05 '11 at 13:52
  • set_num_threads(1) is a part of the OpenMP library, not a OpenMP compiler directive, so it is not skipped by the compiler. – DirkMausF Jun 14 '12 at 11:05
  • @DirkMausF sure, but it could be inline in a header file if an implementation chose to to it that way or even "magic" in some other way. So it can get inlined and optimised away if the compiler can prove it has no effect. The compiler can also choose to produce two "flavours" of code, one parallel, one serial in the same output if it wants to and pick one of those at run time with a single (fairly cheap) branch. There's a lot more options available with OpenMP than some library where you just have the pre-compiled code. – Flexo Jun 14 '12 at 11:11
  • Why should the compiler do so? It's enough to check, if there's only one thread configured and then execute it unparallized... one if...then, that's no overhead compared to rewrite the whole compiler... ;) – DirkMausF Jun 14 '12 at 11:22