0

I'm trying to write a piece of code in c++ (vs2010) that will run in parallel mode using openMP. Everything runs perfectly (all of my processors are busy and for loop progress is as expected, but when I reach time step i = 211 everything slows down. In process monitor I see that I'm using only 14%), then after a while it speeds up again but again slow down on time step i = 316. It does that periodically until it finishes. I'm not sure what is going on. As I'm new to this please forgive me if my question isn't clear enough.

This is the code:

xyQadrant is a vector created earlier in the code - it contains structures! Methods GetUx(..), GetUY(..), CalcVelocity(...), CalcDisplacement(..) use locking and unlocking when accessing data so there shouldn't be any issues with multiple access of the same data by multiple threads.

for(int i = 0; i < 1250; i++)
{
    #pragma omp parallel num_threads(numCPU) shared(xyQuadrant)
    {
    #pragma omp for 

        for(int j = 0; j < xyQuadrant.size(); j++)
        {
            SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST);
            for(int k = 0; k < xyQuadrant[j].qPoints.size(); k++)
            {
                point* targetPoint = &xyQuadrant[j].qPoints[k];
                double strainX = 0;
                double strainY = 0;
                for (int n = 0; n < targetPoint->family.size(); n++)
                {

                    point* curPoint = targetPoint->family[n];

                    if(fabs(curPoint->x - targetPoint->x) < 0.000001)
                    {
                        strainX = 0;
                    }
                    else
                    {
                        double directionX = (curPoint->GetUX(i-1) - targetPoint->GetUX(i-1) + curPoint->x - targetPoint->x)/fabs((curPoint->GetUX(i-1) - targetPoint->GetUX(i-1) + curPoint->x - targetPoint->x));
                        double stretchX = fabs((curPoint->GetUX(i-1) - targetPoint->GetUX(i-1) + curPoint->x - targetPoint->x - fabs((curPoint->x - targetPoint->x))))/fabs(curPoint->x - targetPoint->x);

                        strainX += directionX*c*stretchX*targetPoint->volumeCorrect[n]*(targetPoint->surfaceCorrectX + curPoint->surfaceCorrectX)/2;

                    }

                    if(fabs(curPoint->y - targetPoint->y) < 0.000001)
                    {
                        strainY = 0;
                    }
                    else
                    {
                        double directionY = (curPoint->GetUY(i-1) - targetPoint->GetUY(i-1) + curPoint->y - targetPoint->y)/fabs((curPoint->GetUY(i-1) - targetPoint->GetUY(i-1) + curPoint->y - targetPoint->y));
                        double stretchY = fabs((curPoint->GetUY(i-1) - targetPoint->GetUY(i-1) + curPoint->y - targetPoint->y - fabs((curPoint->y - targetPoint->y))))/fabs(curPoint->y - targetPoint->y);

                        strainY += directionY*c*stretchY*targetPoint->volumeCorrect[n]*(targetPoint->surfaceCorrectY+curPoint->surfaceCorrectY)/2;
                    }
                }


                targetPoint->aX = strainX*deltaV/density;
                targetPoint->aY = strainY*deltaV/density;
                targetPoint->CalcVelocity(deltaT);
                targetPoint->CalcDisplacement(deltaT,i-1);
            }

        }
    }


}

On the final note: I use i7-3770 processor (4 proc 8 threads)- when everything slows down I can see only 4 threads working and other 4 it says CPU parked!

Bozo Vazic
  • 11
  • 5
  • what is `xyQuadrant.size()`? How big it is? Also try setting thread affinity in order to avoid core parking initiatives by OS. – Anton Nov 25 '14 at 17:12
  • xyQadrant is a vector of structures where each structure contains vector of points that belong to it. xyQuadrant has 7140 members and each qPoints vector has cca 36 members! – Bozo Vazic Nov 25 '14 at 17:25
  • well, it's big enough to forget about load balancing. Thread affinity suggestion remains. If you see how the freezes correlate with specific `i` values, check the code which depends on `i`, i.e. `CalcDisplacement` and GetUX/UY – Anton Nov 25 '14 at 17:38
  • I'm not sure how to set thread affinity in visual studio!! some help with that would be appreciated! – Bozo Vazic Nov 25 '14 at 18:13
  • Move `SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST)` before `#pragma omp for`. – Z boson Nov 26 '14 at 08:34
  • 1
    Read through [this question](http://stackoverflow.com/questions/67554/whats-the-best-free-c-profiler-for-windows) and pick a tool from there. – Hristo Iliev Nov 26 '14 at 11:50

1 Answers1

0

Actually I have figured out why is this happening! It is because of the memory. I was trying to save results for each point in quadrant for all time steps. I was saving results in vector of double's and because of that quadrant structures where becoming too large. When I saved only the current time step data everything worked fine! Thanks everybody for the help.

Z boson
  • 32,619
  • 11
  • 123
  • 226
Bozo Vazic
  • 11
  • 5