45

The OpenMP standard only considers C++ 98 (ISO/IEC 14882:1998). This means that there is no standard supporting usage of OpenMP under C++03 or even C++11. Thus, any program that uses C++ >98 and OpenMP operates outside of standards, implying that even if it works under certain conditions, it's unlikely to be portable but definitely never guaranteed.

The situation is even worse with C++11 with its own multi-threading support, which very likely will clash with OpenMP for certain implementations.

So, how safe is it to use OpenMP with C++03 and C++11?

Can one safely use C++11 multi-threading as well as OpenMP in one and the same program but without interleaving them (i.e. no OpenMP statement in any code passed to C++11 concurrent features and no C++11 concurrency in threads spawned by OpenMP)?

I'm particularly interested in the situation where I first call some code using OpenMP and then some other code using C++11 concurrency on the same data structures.

Walter
  • 44,150
  • 20
  • 113
  • 196
  • 31
    Yes, *yes*, **yes**, a thousand times YES! Horrible, horrible, preprocessor hack that integrates poorly with the language, please die! (Disclaimer, I’ve written a library on top of OpenMP and I’ve written a master thesis about this; I know at least superficially what I’m ranting about.) – Konrad Rudolph Dec 12 '12 at 10:31
  • 1
    Yes, but not for the reasons you've written; rather, I would ask what infrastructure actually supports this standard? If you are looking to perform massively parallel computations, I would look towards something that can be done on a cloud computing platform (even if not in C++); if you have to build your own cluster to use OpenMP, it isn't worth it. – Michael Aaron Safyan Dec 12 '12 at 10:35
  • 1
    @MichaelAaronSafyan I was obviously only talking about multi-threading, not about distributed computing. If you want that, you must use something else entirely. – Walter Dec 12 '12 at 10:38
  • Incidentally, “OpenMP” is written with a capital “O” … – Konrad Rudolph Dec 12 '12 at 10:43
  • 1
    Question title is a little inflammatory. Maybe rename to 'How can I safely use OpenMP?' and leave people to decide whether to abandon it. – Peter Wood Dec 12 '12 at 10:47
  • 1
    I am going to vote to close this as not constructive unless the "should abandon" bit gets edited out from the title. – NPE Dec 12 '12 at 11:22
  • 1
    @Walter Stack Overflow, as I understand it, isn't for inflamed discussions. Just letting you know this question in its current form might get closed. – Peter Wood Dec 12 '12 at 11:24
  • @Walter: With regards to "starting a good discussion", this is a Q+A site, not a discussion site. – NPE Dec 12 '12 at 11:24
  • @NPE yes, I'm waiting for a useful answer, e.g. from someone with experience in using both OpenMP and C++ concurrency. – Walter Dec 12 '12 at 11:27
  • 1
    Walter, I suggested an open question, a 'how' question, and you changed @R. Martinho Fernandes edit to be a closed question, with a 'yes' or 'no' answer. Open questions are more likely to lead to rounded answers, which help us make better decisions. – Peter Wood Dec 12 '12 at 11:33
  • On Linux, both OpenMP pragmas and C++11 threads are typically implemeted via Pthreads. Is it really safe to mix them together? I have no experience with these two, however, I once tried to use OpenMP with Intel TBB and this did not work (on Cray machines), thought the OpenMP and TBB parallel sections were strictly separated in the code. – Daniel Langr Apr 26 '16 at 10:34
  • TBB and OpenMP use entirely separate parallel internals, which don't cooperate, so may be combined only by terminating a parallel of one kind before entering one of the other. There is a default delay after completing the final barrier of an omp parallel which is likely to kill performance when there is reliance on starting a TBB parallel immediately after. If combining omp parallel with another threading model, both based on pthreads, there is a chance of it working. As pointed out in the answer, there are possible clashes which may be difficult to overcome. – tim18 Nov 29 '18 at 11:46

5 Answers5

27

Walter, I believe I not only told you the current state of things in that other discussion, but also provided you with information directly from the source (i.e. from my colleague who is part of the OpenMP language committee).

OpenMP was designed as a lightweight data-parallel addition to FORTRAN and C, later extended to C++ idioms (e.g. parallel loops over random-access iterators) and to task parallelism with the introduction of explicit tasks. It is meant to be as portable across as many platforms as possible and to provide essentially the same functionality in all three languages. Its execution model is quite simple - a single-threaded application forks teams of threads in parallel regions, runs some computational tasks inside and then joins the teams back into serial execution. Each thread from a parallel team can later fork its own team if nested parallelism is enabled.

Since the main usage of OpenMP is in High Performance Computing (after all, its directive and execution model was borrowed from High Performance Fortran), the main goal of any OpenMP implementation is efficiency and not interoperability with other threading paradigms. On some platforms efficient implementation could only be achieved if the OpenMP run-time is the only one in control of the process threads. Also there are certain aspects of OpenMP that might not play well with other threading constructs, for example the limit on the number of threads set by OMP_THREAD_LIMIT when forking two or more concurrent parallel regions.

Since the OpenMP standard itself does not strictly forbid using other threading paradigms, but neither standardises the interoperability with such, supporting such functionality is up to the implementers. This means that some implementations might provide safe concurrent execution of top-level OpenMP regions, some might not. The x86 implementers pledge to supporting it, may be because most of them are also proponents of other execution models (e.g. Intel with Cilk and TBB, GCC with C++11, etc.) and x86 is usually considered an "experimental" platform (other vendors are usually much more conservative).

OpenMP 4.0 is also not going further than ISO/IEC 14882:1998 for the C++ features it employs (the SC12 draft is here). The standard now includes things like portable thread affinity - this definitely does not play well with other threading paradigms, which might provide their own binding mechanisms that clash with those of OpenMP. Once again, the OpenMP language is targeted at HPC (data and task parallel scientific and engineering applications). The C++11 constructs are targeted at general purpose computing applications. If you want fancy C++11 concurrent stuff, then use C++11 only, or if you really need to mix it with OpenMP, then stick to the C++98 subset of language features if you want to stay portable.

I'm particularly interested in the situation where I first call some code using OpenMP and then some other code using C++11 concurrency on the same data structures.

There are no obvious reasons for what you want to not be possible, but it is up to your OpenMP compiler and run-time. There are free and commercial libraries that use OpenMP for parallel execution (for example MKL), but there are always warnings (although sometimes hidden deeply in their user manuals) of possible incompatibility with multithreaded code that give information on what and when is possible. As always, this is outside of the scope of the OpenMP standard and hence YMMV.

Community
  • 1
  • 1
Hristo Iliev
  • 72,659
  • 12
  • 135
  • 186
  • just wanted your comments to become an answer ;). I'm actually interested in high-performance computing, but OpenMP (currently) does not serve my purpose well enough: it's not flexible enough (my algorithm is not loop based). – Walter Dec 12 '12 at 12:48
8

I'm actually interested in high-performance computing, but OpenMP (currently) does not serve my purpose well enough: it's not flexible enough (my algorithm is not loop based)

Maybe you are really looking for TBB? That provides support for loop and task based parallelism, as well as a variety of parallel data structures, in standard C++, and is both portable and open-source.

(Full disclaimer: I work for Intel who are heavily involved with TBB, though I don't actually work on TBB but on OpenMP :-); I am certainly not speaking for Intel!).

Jim Cownie
  • 2,409
  • 1
  • 11
  • 20
  • thanks for that answer. I will definitely look into TBB (once I have time). What kind of synchronisation techniques does it support? I would be interested in something similar to MPI's Reduce, i.e. a reduction between several running threads. Can this be done? – Walter Dec 14 '12 at 12:55
5

Like Jim Cownie, I’m also an Intel employee. I agree with him that Intel Threading Building Blocks (Intel TBB) might be a good option since it has loop-level parallelism like OpenMP but also other parallel algorithms, concurrent containers, and lower-level features too. And TBB tries to keep up with the current C++ standard.

And to clarify for Walter, Intel TBB includes a parallel_reduce algorithm as well as high-level support for atomics and mutexes.

You can find the Intel® Threading Building Block’s User Guide at http://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/tbb_userguide/title.htm The User Guide gives an overview of the features in the library.

Mike Voss
  • 221
  • 1
  • 3
  • I've tried Intel TBB. For some obscure reason, the whole code becomes so slow and memory hungry, it always throws bad_alloc exception. Whether, OpneMP version runs in half a minute. – user Jun 03 '13 at 09:26
  • 2
    @user Since asking this questoin almost 3 years ago, I have had excellent experience with tbb and dumped OpenMP completely. This had two reasons. First the obscure `#pragma` way that necessarily operates outside of standards and hence is not portable. Second, my tbb-based code runs faster and tbb provides more flexibility for multi-threaded algorithms (perhaps more recent OpenMP versions are getting there too, though). – Walter Sep 22 '15 at 22:20
4

OpenMP is often (I am aware of no exceptions) implemented on top of Pthreads, so you can reason about some of the interoperability questions by thinking about how C++11 concurrency interoperates with Pthread code.

I don't know if oversubscription due to the use of multiple threading models is an issue for you, but this is definitely an issue for OpenMP. There is a proposal to address this in OpenMP 5. Until then, how you solve this is implementation defined. They are heavy hammers, but you can use OMP_WAIT_POLICY (OpenMP 4.5+), KMP_BLOCKTIME (Intel and LLVM), and GOMP_SPINCOUNT (GCC) to address this. I'm sure other implementations have something similar.

One issue where interoperability is a real concern is w.r.t. the memory model, i.e. how atomic operations behave. This is currently undefined, but you can still reason about it. For example, if you use C++11 atomics with OpenMP parallelism, you should be fine, but you are responsible for using C++11 atomics correctly from OpenMP threads.

Mixing OpenMP atomics and C++11 atomics is a bad idea. We (the OpenMP language committee working group charged with looking at OpenMP 5 base language support) are currently trying to sort this out. Personally, I think C++11 atomics are better than OpenMP atomics in every way, so my recommendation is that you use C++11 (or C11, or __atomic) for your atomics and leave #pragma omp atomic for the Fortran programmers.

Below is an example code that uses C++11 atomics with OpenMP threads. It works as designed everywhere I have tested it.

Full disclosure: Like Jim and Mike, I work for Intel :-)

#if defined(__cplusplus) && (__cplusplus >= 201103L)

#include <iostream>
#include <iomanip>

#include <atomic>

#include <chrono>

#ifdef _OPENMP
# include <omp.h>
#else
# error No OpenMP support!
#endif

#ifdef SEQUENTIAL_CONSISTENCY
auto load_model  = std::memory_order_seq_cst;
auto store_model = std::memory_order_seq_cst;
#else
auto load_model  = std::memory_order_acquire;
auto store_model = std::memory_order_release;
#endif

int main(int argc, char * argv[])
{
    int nt = omp_get_max_threads();
#if 1
    if (nt != 2) omp_set_num_threads(2);
#else
    if (nt < 2)      omp_set_num_threads(2);
    if (nt % 2 != 0) omp_set_num_threads(nt-1);
#endif

    int iterations = (argc>1) ? atoi(argv[1]) : 1000000;

    std::cout << "thread ping-pong benchmark\n";
    std::cout << "num threads  = " << omp_get_max_threads() << "\n";
    std::cout << "iterations   = " << iterations << "\n";
#ifdef SEQUENTIAL_CONSISTENCY
    std::cout << "memory model = " << "seq_cst";
#else
    std::cout << "memory model = " << "acq-rel";
#endif
    std::cout << std::endl;

    std::atomic<int> left_ready  = {-1};
    std::atomic<int> right_ready = {-1};

    int left_payload  = 0;
    int right_payload = 0;

    #pragma omp parallel
    {
        int me      = omp_get_thread_num();
        /// 0=left 1=right
        bool parity = (me % 2 == 0);

        int junk = 0;

        /// START TIME
        #pragma omp barrier
        std::chrono::high_resolution_clock::time_point t0 = std::chrono::high_resolution_clock::now();

        for (int i=0; i<iterations; ++i) {

            if (parity) {

                /// send to left
                left_payload = i;
                left_ready.store(i, store_model);

                /// recv from right
                while (i != right_ready.load(load_model));
                //std::cout << i << ": left received " << right_payload << std::endl;
                junk += right_payload;

            } else {

                /// recv from left
                while (i != left_ready.load(load_model));
                //std::cout << i << ": right received " << left_payload << std::endl;
                junk += left_payload;

                ///send to right
                right_payload = i;
                right_ready.store(i, store_model);

            }

        }

        /// STOP TIME
        #pragma omp barrier
        std::chrono::high_resolution_clock::time_point t1 = std::chrono::high_resolution_clock::now();

        /// PRINT TIME
        std::chrono::duration<double> dt = std::chrono::duration_cast<std::chrono::duration<double>>(t1-t0);
        #pragma omp critical
        {
            std::cout << "total time elapsed = " << dt.count() << "\n";
            std::cout << "time per iteration = " << dt.count()/iterations  << "\n";
            std::cout << junk << std::endl;
        }
    }

    return 0;
}

#else  // C++11
#error You need C++11 for this test!
#endif // C++11
Jeff Hammond
  • 5,374
  • 3
  • 28
  • 45
  • 1
    thanks for your detailled answer. Since answering this question, I have moved on to use tbb entirely for multithreading, as it is sufficient for my needs (and more complete than C++ threads, as it comes with a task scheduler). I was particularly concerned about the lack of standard support for mixing OpenMP with recent C++. Doesn't this have any legal implications (for the properties of programs that do such mixing)? [btw, using `auto` instead of `std::chrono::high_resolution_clock::time_point` would make this code more readable] – Walter Oct 06 '16 at 17:05
  • I'm not aware of any laws against OpenMP + C++11, but IANAL :-) Technically, OpenMP doesn't support C++11, but if their usage is orthogonal, there isn't any reason for them to not work together. – Jeff Hammond Oct 06 '16 at 17:57
1

OpenMP 5.0 now defines the interaction towards C++11. But generally using anything from C++11 and further "may result in unspecified behavior".

This OpenMP API specification refers to ISO/IEC 14882:2011 as C++11. While future versions of the OpenMP specification are expected to address the following features, currently their use may result in unspecified behavior.

  • Alignment support
  • Standard layout types
  • Allowing move constructs to throw
  • Defining move special member functions
  • Concurrency
  • Data-dependency ordering: atomics and memory model
  • Additions to the standard library
  • Thread-local storage
  • Dynamic initialization and destruction with concurrency
  • C++11 library
Zulan
  • 21,896
  • 6
  • 49
  • 109