I have a very big binary tree on which I want to do certain expensive calculations. I want to parallelize these methods using openmp and its task pragmas.
As a test, I have parallelized the freeTree
function and I have created a big example test tree. In order to prevent spawning many tasks, I have limited the tasks creation to the top two levels of the tree. So effectively only 4 tasks are created.
Below is the minimal working example:
#include <chrono>
#include <iostream>
class Node {
public:
int data;
Node* l;
Node* r;
Node(Node* left, Node* right) : l(left), r(right) {}
};
Node* createRandomTree(int depth) {
if (depth == 0)
return new Node(NULL, NULL);
return new Node(createRandomTree(depth - 1), createRandomTree(depth - 1));
}
void freeTree(Node* tree) {
if (tree == NULL) return;
freeTree(tree->l);
freeTree(tree->r);
delete tree;
}
void freeTreePar(Node* tree, int n = 0) {
if (tree == NULL) return;
Node *l = tree->l, *r = tree->r;
if (n < 2) {
#pragma omp task
freeTreePar(l, n + 1);
#pragma omp task
freeTreePar(r, n + 1);
} else {
freeTree(tree->l);
freeTree(tree->r);
}
// taskwait is not necessary
delete tree;
}
int main(int argc, char const *argv[])
{
std::chrono::time_point<std::chrono::system_clock> start, end;
Node* tree = createRandomTree(22);
start = std::chrono::system_clock::now();
#pragma omp parallel shared(tree)
{
#pragma omp single nowait
freeTreePar(tree);
}
end = std::chrono::system_clock::now();
std::chrono::duration<double> elapsed_seconds = end-start;
std::time_t end_time = std::chrono::system_clock::to_time_t(end);
std::cout << "finished computation at " << std::ctime(&end_time)
<< "elapsed time: " << elapsed_seconds.count() << "s\n";
return 0;
}
When I run this code, it takes about 0.38 seconds to free the tree. However, if I just call freeTree(root)
directly, it only takes 0.2 seconds. So even though the tree is very big (2^22 elements) and even though in this particular test case the tasks are of equal size, I cannot get a performance increase using parallel methods.
Am I doing something wrong? Do you know how I can improve this code? Thanks!