Bottom line
How can I make sure that the threadprivate instances are properly destructed?
Background
When answering this question I came across an oddity when using the Intel C++ 15.0 compiler in VS2013. When declaring a global variable threadprivate
the slave threads copies are not destructed. I started looking for ways to force their destruction. At this site, they say that adding an OMP barrier should help. It doesn't (see MCVE). I tried setting the OMP blocktime to 0 so that the threads should not stick around after the parallel region (also, didn't help). I tried adding some dummy calculations that delay the main thread, giving the other threads time to die. Still didn't help.
MCVE:
#include <iostream>
#include <omp.h>
class myclass {
int _n;
public:
myclass(int n) : _n(n) { std::cout << "int c'tor\n"; }
myclass() : _n(0) { std::cout << "def c'tor\n"; }
myclass(const myclass & other) : _n(other._n)
{ std::cout << "copy c'tor\n"; }
~myclass() { std::cout << "bye bye\n"; }
void print() { std::cout << _n << "\n"; }
void add(int t) { _n += t; }
};
myclass globalClass;
#pragma omp threadprivate (globalClass)
int main(int argc, char* argv[])
{
std::cout << "\nBegninning main()\n";
// Kill the threads immediately
kmp_set_blocktime(0);
#pragma omp parallel
{
globalClass.add(omp_get_thread_num());
globalClass.print();
#pragma omp barrier
//Barrier doesn't help
}
// Try some busy work, takes a few seconds
double dummy = 0.0;
for (int i = 0; i < 199999999; i++)
{
dummy += (sin(i + 0.1));
}
std::cout << dummy << "\n";
std::cout << "Exiting main()\n";
return 0;
}
The output is
def c'tor
Begninning main()
def c'tor
1
def c'tor
3
def c'tor
2
0
1.78691
Exiting main()
bye bye
There is only one "bye bye" where I would have expected four.
Update
Following Kyle's quote of the OMP 4.0 standard where it states
The storage of all copies of a threadprivate variable is freed according to how static variables are handled in the base language, but at an unspecified point in the program.
I added a static instance of the class (both global and local) to see if its destructor gets called. It does, both for the local and the global case. So the question still stands.