I have a fairly complex program that runs into strange behavior when build with OpenMP in MSVC 2010 Debug mode. I have tried my best to construct the following minimal working example (though it is not really minimal) which minic the structure of the real program.
#include <vector>
#include <cassert>
// A class take points to the whole collection and a position Only allow access
// to the elements at that posiiton. It provide read-only access to query some
// information about the whole collection
class Element
{
public :
Element (int i, std::vector<double> *src) : i_(i), src_(src) {}
int i () const {return i_;}
int size () const {return src_->size();}
double src () const {return (*src_)[i_];}
double &src () {return (*src_)[i_];}
private :
const int i_;
std::vector<double> *const src_;
};
// A Base class for dispatch
template <typename Derived>
class Base
{
protected :
void eval (int dim, Element elem, double *res)
{
// Dispatch the call from Evaluation<Derived>
eval_dispatch(dim, elem, res, &Derived::eval); // Point (2)
}
private :
// Resolve to Derived non-static member eval(...)
template <typename D>
void eval_dispatch(int dim, Element elem, double *res,
void (D::*) (int, Element, double *))
{
#ifndef NDEBUG // Assert that this is a Derived object
assert((dynamic_cast<Derived *>(this)));
#endif
static_cast<Derived *>(this)->eval(dim, elem, res);
}
// Resolve to Derived static member eval(...)
void eval_dispatch(int dim, Element elem, double *res,
void (*) (int, Element, double *))
{
Derived::eval(dim, elem, res); // Point (3)
}
// Resolve to Base member eval(...), Derived has no this member but derived
// from Base
void eval_dispatch(int dim, Element elem, double *res,
void (Base::*) (int, Element, double *))
{
// Default behavior: do nothing
}
};
// A middle-man who provides the interface operator(), call Base::eval, and
// Base dispatch it to possible default behavior or Derived::eval
template <typename Derived>
class Evaluator : public Base<Derived>
{
public :
void operator() (int N , int dim, double *res)
{
std::vector<double> src(N);
for (int i = 0; i < N; ++i)
src[i] = i;
#pragma omp parallel for default(none) shared(N, dim, src, res)
for (int i = 0; i < N; ++i) {
assert(i < N);
double *r = res + i * dim;
Element elem(i, &src);
assert(elem.i() == i); // Point (1)
this->eval(dim, elem, r);
}
}
};
// Client code, who implements eval
class Implementation : public Evaluator<Implementation>
{
public :
static void eval (int dim, Element elem, double *r)
{
assert(elem.i() < elem.size()); // This is where the program fails Point (4)
for (int d = 0; d != dim; ++d)
r[d] = elem.src();
}
};
int main ()
{
const int N = 500000;
const int Dim = 2;
double *res = new double[N * Dim];
Implementation impl;
impl(N, Dim, res);
delete [] res;
return 0;
}
The real program does not have vector
etc. But the Element
, Base
, Evaluator
and Implementation
captures the basic structure of the real program. When build in Debug mode, and running the debugger, the assertion fails at Point (4)
.
Here is some more details of the debug informations, by viewing the call stacks,
At entering Point (1)
, the local i
has value 371152
, which is fine. The variable elem
does not shown up in the frame, which is a little strange. But since the assertion at Point (1)
does not faile, I guess it is fine.
Then, crazy things happened. The call to eval
by Evaluator
resolves to its base class, and so Point (2)
was exectuted. At this point, the debugers shows that the elem
has i_ = 499999
, which is no longer the i
used to create elem
in Evaluator
before passing it by value to Base::eval
. The next point, it resolves to Point (3)
, this time, elem
has i_ = 501682
, which is out of range, and this is the value when the call is directed to Point (4)
and failed the assertion.
It looks like whenever Element
object is passed by value, the value of its members are changed. Rerun the program multiple times, similar behaviors happens though not always reproducible. In the real program, this class is designed to like an iterator, which iterate over a collection of particles. Though the thing it iterate is not exaclty like a container. But anyway, the point is that it is small enough to be efficiently passed by value. And therefore, the client code, knows that it has its own copy of Element
instead of some reference or pointer, and does not need to worry about thread-safe (much) as long as he sticks with Element
's interface, which only provide write access to a single position of the whole collection.
I tried the same program with GCC and Intel ICPC. Nothing un-expected happens. And in the real program, correct results where produced.
Did I used OpenMP wrongly somewhere? I thought that the elem
created at about Point (1)
shall be local to the loop body. In addition, in the whole program, no value bigger than N
was produced, so where does the those new value comes from?
Edit
I looked more carefully into the debugger, it shows that while elem.i_
was changed when elem
was passed by value, the pointer elem.src_
does not change with it. It has the same value (of the memory address) after passed by value
Edit: Compiler flags
I used CMake to generate the MSVC solution. I have to confess I have no idea how to use MSVC or Windows in general. The only reason I am using it is that I know a lot of people use it so I want to test my library against it to workaround any problems.
The CMake generated project, using Visual Studio 10 Win64
target, the compiler flags appears to be
/DWIN32 /D_WINDOWS /W3 /Zm1000 /EHsc /GR /D_DEBUG /MDd /Zi /Ob0 /Od /RTC1
And here is the command line found in Property Pages-C/C++-Command Line
/Zi /nologo /W3 /WX- /Od /Ob0 /D "WIN32" /D "_WINDOWS" /D "_DEBUG" /D "CMAKE_INTDIR=\"Debug\"" /D "_MBCS" /Gm- /EHsc /RTC1 /MDd /GS /fp:precise /Zc:wchar_t /Zc:forScope /GR /openmp /Fp"TestOMP.dir\Debug\TestOMP.pch" /Fa"Debug" /Fo"TestOMP.dir\Debug\" /Fd"C:/Users/Yan Zhou/Dropbox/Build/TestOMP/build/Debug/TestOMP.pdb" /Gd /TP /errorReport:queue
Is there anything suspecious here?