4

Bottom line

How can I make sure that the threadprivate instances are properly destructed?

Background

When answering this question I came across an oddity when using the Intel C++ 15.0 compiler in VS2013. When declaring a global variable threadprivate the slave threads copies are not destructed. I started looking for ways to force their destruction. At this site, they say that adding an OMP barrier should help. It doesn't (see MCVE). I tried setting the OMP blocktime to 0 so that the threads should not stick around after the parallel region (also, didn't help). I tried adding some dummy calculations that delay the main thread, giving the other threads time to die. Still didn't help.

MCVE:

#include <iostream>
#include <omp.h>

class myclass {
    int _n;
public:
    myclass(int n) : _n(n) { std::cout << "int c'tor\n"; }

    myclass() : _n(0) { std::cout << "def c'tor\n"; }

    myclass(const myclass & other) : _n(other._n)
                    { std::cout << "copy c'tor\n"; }

    ~myclass() { std::cout << "bye bye\n"; }

    void print() { std::cout << _n << "\n"; }

    void add(int t) { _n += t; }
};

myclass globalClass;
#pragma omp threadprivate (globalClass)

int main(int argc, char* argv[])
{
    std::cout << "\nBegninning main()\n";

    // Kill the threads immediately
    kmp_set_blocktime(0);

#pragma omp parallel
    {
        globalClass.add(omp_get_thread_num());
        globalClass.print();
#pragma omp barrier
        //Barrier doesn't help
    }

    // Try some busy work, takes a few seconds
    double dummy = 0.0;
    for (int i = 0; i < 199999999; i++)
    {
        dummy += (sin(i + 0.1));
    }
    std::cout << dummy << "\n";
    
    std::cout << "Exiting main()\n";
    return 0;
}

The output is

def c'tor

Begninning main()
def c'tor
1
def c'tor
3
def c'tor
2
0
1.78691
Exiting main()
bye bye

There is only one "bye bye" where I would have expected four.

Update

Following Kyle's quote of the OMP 4.0 standard where it states

The storage of all copies of a threadprivate variable is freed according to how static variables are handled in the base language, but at an unspecified point in the program.

I added a static instance of the class (both global and local) to see if its destructor gets called. It does, both for the local and the global case. So the question still stands.

Community
  • 1
  • 1
Avi Ginsburg
  • 10,323
  • 3
  • 29
  • 56

1 Answers1

3

This is documented behavior (though I have no idea why this decision was made).

From the MSDN entry on threadprivate (with some formatting changes):

A threadprivate variable of a destructable type is not guaranteed to have its destructor called.

...

Users have no control as to when the threads constituting the parallel region will terminate. If those threads exist when the process exits, the threads will not be notified about the process exit, and the destructor will not be called for threaded_var on any thread except the one that exits (here, the primary thread). So code should not count on proper destruction of threadprivate variables.

The OpenMP version 4.0 standard leaves the order of destructor-call behavior unspecified. From section 12.14.2:

Page 151, lines 7-9:

The storage of all copies of a threadprivate variable is freed according to how static variables are handled in the base language, but at an unspecified point in the program.

Page 152, lines 8-10:

The order in which any constructors for different threadprivate variables of class type are called is unspecified. The order in which any destructors for different threadprivate C++ variables of class type are called is unspecified.

Personally, it seems to me that Microsoft may be taking this as too much of a blank check; destructor order being unspecified seems substantially different from failing to guarantee at all that a destructor will be called. The way static variables are handled in the base language (C++ in this case) is that destructors are guaranteed to be called. So I think MSVC is nonconforming (to both the C++ standard and the OMP standard), but since I'm not a language lawyer, don't take my word for it.

With that said, it's hard to see how this could have serious repercussions. You certainly should not see any memory leaks, since threadprivate storage space should be allocated/deallocated all at once when the thread is created/destroyed. (And if your threadprivate instances have references to non-threadprivate memory that they manage, well...that doesn't seem like it will work in the first place.)

Community
  • 1
  • 1
Kyle Strand
  • 15,941
  • 8
  • 72
  • 167
  • I had seen that. It's kinda funny, cuz VS won't compile my example. Is that part of the omp standard? I haven't found something comparable for the Intel compiler (the only compiler I found/have that works). – Avi Ginsburg Sep 03 '15 at 19:38
  • @AviGinsburg I just looked it up, and (in version 4) it's unspecified. – Kyle Strand Sep 03 '15 at 19:42
  • I've added the reference. – Kyle Strand Sep 03 '15 at 19:50
  • [Source](http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf) From page 181: "The storage of all copies of a threadprivate variable is freed according to how static variables are handled in the base language, but at an unspecified point in the program." Page 182: " The order in which any destructors for different threadprivate C++ variables of class type are called is unspecified" – Avi Ginsburg Sep 03 '15 at 19:52
  • You added the same info... I agree with your understanding that *order* seems to imply that they'll be called. But it should be a moot point, at the compiler is Intel C++ and not cl. – Avi Ginsburg Sep 03 '15 at 19:56
  • @AviGinsburg Huh. Well, your first quote (which I missed) was relevant, too, so I've added it as well. – Kyle Strand Sep 03 '15 at 20:03