So I am quite new to OpenMP, but I've recently started to play with it and now I want to use it in order to "speed-up" some calculations which I need to do. I have a function which depends on 3 variables and I want to evaluate it at some points in an interval, then show the values in a file. The thing is that the intervals for the 3 parameters are quite large, so I will have a lot of function evaluations. Using the 3-nested for loop is quite a pain if my interval is large.
The serial implementation is straight-forward, just make a 3-nested loop where each index i,j,k takes the value of the corresponding parameter (integer numbers from 1 to DIM, and evaluate the function in the point (i,j,k).
In the OpenMP approach, obviously I thought of using the #pragma omp parallel for
hoping that the program runtime will be faster.
Here is the code I wrote for serial implementation and the "parallel" one. Please keep in mind that DIM is set here to a smaller number just for testing purposes.
#include <iostream>
#include <chrono>
#include <omp.h>
#include <cmath>
#include <fstream>
using namespace std;
ofstream outparallel("../output/outputParallel.dat");
ofstream outserial("../output/outputSerial.dat");
const int spaceDIM = 80;
double myFunction(double a, double b, double c)
{
return (double)a * log(b) + exp(a / b) + c;
}
void serialAlgorithmTripleLoop()
{
auto sum = 0;
auto timeStart = chrono::high_resolution_clock::now();
for (int i = 1; i <= spaceDIM; ++i)
for (int j = 1; j <= spaceDIM; ++j)
for (int k = 1; k <= spaceDIM; ++k)
{
//sum += i * j * k;
outserial << i << " " << j << " " << k << " " << myFunction((double)i, (double)j, (double)k) << endl;
}
auto timeStop = chrono::high_resolution_clock::now();
auto execTime = chrono::duration_cast<chrono::seconds>(timeStop - timeStart).count();
cout << "Serial execution time = " << execTime << " seconds";
cout << endl;
outserial << "Execution time = " << execTime << " seconds";
outserial << endl;
}
void parallelAlgorithmTripleLoop()
{
//start of the actual algorithm
auto sum = 0;
auto timeStart = chrono::high_resolution_clock::now();
#pragma omp parallel for
for (int i = 1; i <= spaceDIM; ++i)
for (int j = 1; j <= spaceDIM; ++j)
for (int k = 1; k <= spaceDIM; ++k)
{
// sum += i * j * k;
outparallel << i << " " << j << " " << myFunction((double)i, (double)j, (double)k) << endl;
}
auto timeStop = chrono::high_resolution_clock::now();
auto execTime = chrono::duration_cast<chrono::seconds>(timeStop - timeStart).count();
cout << "Parallel execution time = " << execTime << " seconds";
cout << endl;
outparallel << "Execution time = " << execTime << " seconds";
outparallel << endl;
}
int main()
{
cout << "FUNCTION OPTIMIZATION" << endl;
serialAlgorithmTripleLoop();
parallelAlgorithmTripleLoop();
return 0;
}
The output is unexpected for me: using the parallel approach I get longer execution time than the serial one. I also tried to use "reduction" and "ordered" and "collapsed" clauses from OMP standard, but none helped me. I'm running this on a 4-cores 8-threads laptop.
FUNCTION OPTIMIZATION
Serial execution time = 4 seconds
Parallel execution time = 7 seconds
Q: How can I properly speed up the evaluation of the function?