I have many functions that loop over data stored in a 4D array. We are using some OpenMP to help iterate over this data in locations where it makes most sense and we are not overwriting things.
For instance I have the following snippet of code:
void MaximumCombiner::createComplexData(double**** input, const int nX, const int nY,
const int nZ, const int nA, std::complex<double>**** output,
const double* beamData)
{
for(int iX = 0; iX < nX; ++iX)
{
for(int iY = 0 ; iY < nY ; ++iY)
{
std::complex<double> complexArg(0.0, (beamData[iY] * M_PI / 180.0));
std::complex<double> complexExp = std::exp(complexArg);
for(int iZ = 0; iZ < nZ; ++iZ)
{
for(int iA = 0 ; iA < nA ; ++iA)
{
output[iX][iY][iZ][iA] = input[iX][iY][iZ][iA] * complexExp;
}
}
}
}
}
Originally, I thought I should add a #pragma omp parallel for
before each for-loop but now I am wondering if I am spending more time with overhead of creation/deletion of threads than actual work. I also have tried out using #pragma omp parallel
above the first for-loop and stuck a #pragma omp for
on one of the inner loops, but am not sure if this is best either. What should I look for when trying to decide where to place my OMP calls?