3

I work with OpenMP under GCC 6.2.0 and C++1z. I tried to use thread_local objects that are created inside threads when they are needed. thread_local objects work almost fine but it seems that destructors are called for only one thread. I can simulate the issue with the following code. Is the code using some unallowed feature or is there possibly some issue with the GCC implementation?

#include <iostream>
#include <memory>
#include <mutex>
#include <thread>
#include <sstream>

std::mutex g_cerr_mutex;

struct X {
    std::string name_;

    X() {
            std::stringstream ss;
            ss << std::this_thread::get_id();
            name_ = ss.str();
    }

    ~X() noexcept {
            std::lock_guard<std::mutex> guard(g_cerr_mutex);
            std::cerr << "Destructing: " << name_ << std::endl;
    }
};

int main(void) {
    static thread_local std::unique_ptr<X> ptr;

    #pragma omp parallel for
    for (unsigned x = 0; x < 32; ++x) {
            if (!ptr) {
                    ptr.reset(new X);
            }
            std::lock_guard<std::mutex> guard(g_cerr_mutex);
            std::cerr << std::this_thread::get_id() << " : " <<  static_cast<void*>(ptr.get()) << std::endl;
    }

    return 0;
}

Code is compiled and build under linux with 4-core i7 CPU. Commands for the compilation look this way:

$ g++ -std=gnu++1z -fopenmp -Wall -Werror -Ofast -pthread -c omp.cpp 
$ g++ -std=gnu++1z -fopenmp -Wall -Werror -Ofast -pthread omp.o -o omp

Output of the programm looks this way:

139868398491392 : 0x7f35780008c0 
139868398491392 : 0x7f35780008c0 
139868398491392 : 0x7f35780008c0 
139868398491392 : 0x7f35780008c0 
139868453738496 : 0x7bc2d0 
139868453738496 : 0x7bc2d0 
139868453738496 : 0x7bc2d0 
139868453738496 : 0x7bc2d0 
139868423669504 : 0x7f35880008c0 
139868423669504 : 0x7f35880008c0 
139868423669504 : 0x7f35880008c0 
139868423669504 : 0x7f35880008c0 
139868406884096 : 0x7f35700008c0 
139868406884096 : 0x7f35700008c0 
139868406884096 : 0x7f35700008c0 
139868406884096 : 0x7f35700008c0 
139868432062208 : 0x7f35a00008c0 
139868432062208 : 0x7f35a00008c0 
139868432062208 : 0x7f35a00008c0 
139868432062208 : 0x7f35a00008c0 
139868390098688 : 0x7f35900008c0 
139868390098688 : 0x7f35900008c0 
139868390098688 : 0x7f35900008c0 
139868390098688 : 0x7f35900008c0 
139868415276800 : 0x7f35980008c0 
139868415276800 : 0x7f35980008c0 
139868415276800 : 0x7f35980008c0 
139868415276800 : 0x7f35980008c0 
139868381705984 : 0x7f35800008c0 
139868381705984 : 0x7f35800008c0 
139868381705984 : 0x7f35800008c0 
139868381705984 : 0x7f35800008c0 
Destructing: 139868453738496

Obviously only one destructor is called.

Mike Kinghan
  • 55,740
  • 12
  • 153
  • 182
faramir
  • 251
  • 4
  • 13
  • 1
    How many constructors are called? – Karoly Horvath Jan 11 '17 at 13:27
  • 1
    Thread local and thread pools is a surefire way to create memory leaks and other fun problems. – Voo Jan 11 '17 at 14:01
  • OpenMP has it's own means of declaring thread-local variables using `#pragma omp threadlocal(...)`. It is often (but not always!) implementation-compatible with the language constructs, e.g., the same TLS mechanism is used, but the semantics differ. Do not mix OpenMP and C++ threading! – Hristo Iliev Jan 12 '17 at 09:45
  • BTW, have you tried running the same code with `std::thread`? No destructor is called at all. In the OpenMP case thread 0 is the main thread, therefore you get the destructor called once at the end of the serial part. GCC uses ELF TLS to implement thread-local variables, which, as far as I recall, doesn't (or at least used to not) support constructors and destructors, therefore GCC only allows PODs as thread-local variables. This is exposed in GCC's OpenMP `threadprivate` - if you try something like `#pragma omp threadprivate(ptr)` instead of `thread_local`, GCC throws an error. – Hristo Iliev Jan 12 '17 at 10:49
  • Hristo, you are right. With std::thread, it doesn't call destructor either, only constructors... – faramir Jan 13 '17 at 12:16
  • Karoly, it seems that all constructors (8 in my case) are called. – faramir Jan 13 '17 at 12:16

1 Answers1

0

Mixing C++ language threading features and OpenMP is not well-defined. (See related questions). Basically OpenMP only refers to C++98, so the interaction with OpenMP and threadlocal is not safe/portable. It is usually assumed that it will work, because implementations do the right thing, but in this case apparently they do not. BTW: I can reproduce the same issue with Intel compiler / OpenMP runtime.

The safe and portable approach is to stick to either pure C++17 or OpenMP. With OpenMP, this means to define ptr as private:

static std::unique_ptr<X> ptr;
#pragma omp parallel
{
    ptr.reset();
    #pragma omp for
    for (unsigned x = 0; x < 32; ++x) {

Note that the reset is necessary, otherwise the value of ptr are undefined. You cannot use firstprivate as std::unique_ptr has no copy-ctor.

Zulan
  • 21,896
  • 6
  • 49
  • 109