3

I have a problem with OpenMP parallelization. I work on a self written Finite Volume Code (numerical method for solution of flow problems). To enhance Performance I use OpenMP parallelization.

Code Structure

The Data structures I work on are cells, which store the actual simulation data, and faces which compute fluxes that change the cell data. A face is connected to two cells and a cell is connected to four interfaces.

The operations that work only per Cell (without data from other cells or faces) parallelize well with #pragma omp parallel for. The parallelization of the flux computation does not work though. In this computation not only data of the face it self but also data of adjacent cells is used.

This is the parallelization of the flux computation and cell update:

#pragma omp parallel for
for ( int id = 0; id < numberOfInterfaces; ++id )
    computeFlux(id);

#pragma omp parallel for
for ( int id = 0; id < numberOfCells; ++id )
    if ( !isGhostCell(id) )
        updateCell(id); 

computeFlux() uses the data from CellData to compute the contribution of one Face to a cell. These Contributions are cumulated in CellUpdate, since mutible faces contribute to the cell. Only when all faces contributed to the cells I loop over all cells and update CellData with the cumulated contributions in Cell. This means that the data I read in computeFlux() does not change during the flux computations. Therefore the order of calling computeFlux() for diefferent faces should not be relevant.

The computeFlux() method performs many computations:

void GKSSolver::computeFlux(const idType id)
{
    const int NUMBER_OF_MOMENTS = 7;

    PrimitiveVariable prim;
    ConservedVariable normalGradCons;
    ConservedVariable timeGrad;

    ConservedVariable InterfaceFlux;

    double a[4];
    double b[4] = {0.0, 0.0, 0.0, 0.0};
    double A[4];

    double MomentU[NUMBER_OF_MOMENTS];
    double MomentV[NUMBER_OF_MOMENTS];
    double MomentXi[NUMBER_OF_MOMENTS];

    prim = reconstructPrimPiecewiseConstant(id);
    normalGradCons = differentiateConsNormal(id, prim.rho);
    this->global2local(id, prim);
    this->global2local(id, normalGradCons);
    this->computeMicroSlope(prim, normalGradCons, a);
    this->computeMoments(prim, MomentU, MomentV, MomentXi, NUMBER_OF_MOMENTS);
    timeGrad = this->computeTimeDerivative(MomentU, MomentV, MomentXi, a, b);

    this->computeMicroSlope(prim, timeGrad, A);
    double tau = 2.0*prim.L * this->fluidParam.nu;
    double timeCoefficients[3] = { this->dt, -tau*this->dt, 0.5*this->dt*this->dt - tau*this->dt };
    InterfaceFlux = this->assembleFlux(MomentU, MomentV, MomentXi, a, b, A, timeCoefficients, prim, this->getInterfaceArea(id), tau);
    this->local2global( id, InterfaceFlux );

    // ========================================================================

    this->applyFlux(id, InterfaceFlux);
}

All the methods above the comment line only work on local variables within computeFlux() (which should thereby be threadprivate) or read data from non local variables. The only non local write operations (write in shared data) take place in applyFlux():

void GKSSolverPush::applyFlux(idType id, ConservedVariable flux)
{
    if( this->InterfaceAdd2Cell[id][0] )
    {
        #pragma omp atomic
        this->CellUpdate[ this->Interface2Cell[id][0] ].rho  += flux.rho ;
        #pragma omp atomic
        this->CellUpdate[ this->Interface2Cell[id][0] ].rhoU += flux.rhoU;
        #pragma omp atomic
        this->CellUpdate[ this->Interface2Cell[id][0] ].rhoV += flux.rhoV;
        #pragma omp atomic
        this->CellUpdate[ this->Interface2Cell[id][0] ].rhoE += flux.rhoE;
    }

    if( this->InterfaceAdd2Cell[id][1] )
    {
        #pragma omp atomic
        this->CellUpdate[ this->Interface2Cell[id][1] ].rho  -= flux.rho ;
        #pragma omp atomic
        this->CellUpdate[ this->Interface2Cell[id][1] ].rhoU -= flux.rhoU;
        #pragma omp atomic
        this->CellUpdate[ this->Interface2Cell[id][1] ].rhoV -= flux.rhoV;
        #pragma omp atomic
        this->CellUpdate[ this->Interface2Cell[id][1] ].rhoE -= flux.rhoE;
    }
}

The write operations in applyFlux() are not thread safe, since multiple faces (which might be processed by different threads) might update CellUpdate of a specific cell simultaneously. Therefore I protect these operations with #pragma omp atomic.

Occuring Error

When I run the programs with and without #pragma omp parallel for I get different results meaning, that the parallelization does at some point chrash my computations. This point is unknown to me though.

What does work and what does not work

To identify the error I tried different OpenMP pragmas.

Adding a critical section around the non threadsafe part does not solve the problem.

#pragma omp critical
this->applyFlux(id, InterfaceFlux);

Making the non threadsafe part ordered solves the problem:

#pragma omp ordered
this->applyFlux(id, InterfaceFlux);

Question

In which ways can the parallelization with #pragma omp parallel for introduce errors in the computation?

Are local variables (in the scope of a function called inside the shared work load) allways private? Or might they also be shared? How would I declare them private then?

Can race conditions also appear while reading data? Multible faces (multible threads) might read the same cell data simulateously in my code. As far as I understand it that should not yield race conitions, but I might be wrong.

My first guess was that I have a race condition in applyFlux(). But neither atomic nor critical work. The order of execution should not be problem, since the only data I write to (which is shared by the threads) is not read in any call of computeFlux().

I would be glad if you could give me some hints where I should look for errors.

Thank you very much,

Stephan

P.s.: I am sorry for my long post, but I was not able to reproduce the problem in a minimal example and therefore had to post parts of my actual code.

Lenz
  • 81
  • 5
  • As I know, the race condition on data maybe: Read After Write, Write After Read, Write After Write. The more details you can read https://en.wikipedia.org/wiki/Hazard_(computer_architecture). I guess the race condition can be happen when reading if the resource offer limited simutaneously reading at the same time. :( – khôi nguyễn Nov 20 '16 at 12:36
  • Atomic and critical affect is quite same http://stackoverflow.com/questions/7798010/openmp-atomic-vs-critical I guess your code involve the order of execution so the ordered works. – khôi nguyễn Nov 20 '16 at 12:40
  • Thanks for oyur comments. The execution order should not be a problem, since I only write to shared data which I do not use in the whole flux computation. I added a small explanation of this! – Lenz Nov 20 '16 at 13:11
  • What do you exactly mean by "I get different results" ? What do you get and what do you expect to get? – Harald Nov 21 '16 at 10:49
  • You should take care with the assignments you make in `computeFlux`. You are writing into members of `GKSSolver`: `this->prim = reconstructPrimPiecewiseConstant(id);` but `this` may not be private. – Jorge Bellon Nov 21 '16 at 19:24
  • Thank you for your comments! @Harald: I have different versions of the algorithm. They converge to plausible physical results. But in case of this algorithm it does not do so always when I parallelize it with omp. I recently also compared the serial and parallel execution timestep by timestep and the deviation of the parallel version appears only after tens of thousands of timesteps. Before that all values seem to be identical up to all digits that the VS debugger shows me. This seems Illogical to me. – Lenz Nov 23 '16 at 09:35
  • @JorgeBellón: I have to apologize. I made a mistake there when I copied the code into the post. I added some `this->` to make things more clear, but unfortunately added it also at two wrong points. I corrected that. `prim` and `normalGradCons` are local variables and not member attributes. I am sorry for that confusion. – Lenz Nov 23 '16 at 09:35
  • Maybe your algorithm is simply numerically unstable and sensitive to order of summation... See [this post](http://stackoverflow.com/a/27680902/5239503) for more details. – Gilles Nov 23 '16 at 09:37

0 Answers0