1

I've been trying to run some parallelised code in C++ from Python through cppyy but am facing an error.

The executable (compilted through GCC with -fopenmp -O2) runs without errors and shows the expected drop in runtime from parallelisation.

When the #pragma omp parallel for is commented out of the C++ code, cppyy doesn't raise any errors. However, when the pragma is part of the code I get the error below:

IncrementalExecutor::executeFunction: symbol '__kmpc_for_static_fini' unresolved while linking symbol '__cf_4'!
IncrementalExecutor::executeFunction: symbol '__kmpc_for_static_init_4' unresolved while linking symbol '__cf_4'!
IncrementalExecutor::executeFunction: symbol '__kmpc_fork_call' unresolved while linking symbol '__cf_4'!
IncrementalExecutor::executeFunction: symbol '__kmpc_global_thread_num' unresolved while linking symbol '__cf_4'!
Traceback (most recent call last):
  File "...../SO_troubleshooting/example_pll_cppyy_code.py", line 8, in <module>
    output = cppyy.gbl.pll_somelinalgeb()
ValueError: std::vector<std::vector<Eigen::Matrix<double,-1,-1,0,-1,-1> > > ::pll_somelinalgeb() =>
    ValueError: nullptr result where temporary expected

Here is the short Python script:

import cppyy
cppyy.add_include_path('../np_vs_eigen/eigen/')
cppyy.include('easy_example.cpp')
vector = cppyy.gbl.std.vector
import datetime as dt
print('Starting the function call now ')

start = dt.datetime.now()
output = cppyy.gbl.pll_somelinalgeb()
stop = dt.datetime.now()
print((stop-start), 'seconds')

The C++ toy code is below. It generates a random matrix with Eigen, calculates its pseudo-inverse, and then sleeps for 1 ms.

#include <omp.h>
#include <iostream>
#include <Eigen/Dense>
#include <chrono>
#include <vector>
#include <thread>

using Eigen::VectorXd;
using Eigen::MatrixXd;

std::vector<MatrixXd> some_linearalgebra(){
    std::vector<MatrixXd> solutions;
    std::srand((unsigned int) time(0));//ensures a new random matrix each time
    MatrixXd arraygeom(5,3);
    arraygeom = MatrixXd::Random(5,3);
    VectorXd row1 = arraygeom.block(0,0,1,3).transpose();
    arraygeom.rowwise() -= row1.transpose();
    MatrixXd pinv_arraygeom(3,5);


    // calculate the pseudoinverse of arraygeom
    pinv_arraygeom = arraygeom.completeOrthogonalDecomposition().pseudoInverse();
    //std::cout << pinv_arraygeom << std::endl;
    solutions.push_back(pinv_arraygeom);
    solutions.push_back(pinv_arraygeom);
    std::this_thread::sleep_for(std::chrono::milliseconds(1));
    return solutions;
}

std::vector<std::vector<MatrixXd>> pll_somelinalgeb(){
int num_runs = 5000;
std::vector<std::vector<MatrixXd>> all_solns(num_runs);
#pragma omp parallel for
for (int i=0; i<num_runs; i++){
    all_solns[i] = some_linearalgebra();
}
return all_solns;
}


int main(){
std::vector<MatrixXd> main_out;
main_out = some_linearalgebra();

auto start = std::chrono::system_clock::now();

std::vector<std::vector<MatrixXd>> main2_out;
main2_out = pll_somelinalgeb();
auto end = std::chrono::system_clock::now();
std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << " ms" << std::endl;

return 0;
}

System + OS specs:

  • Ubuntu 18.04.2 LTS,Intel® Core™ i7-10700 CPU @ 2.90GHz × 16 , 64Bit
  • Python 3.9.0
  • cppyy 2.4.0 (pip install)
  • Eigen 3.4.0

Non-default precompiled cppyy header used:

As per this link I ran the following commands on terminal export EXTRA_CLING_ARGS='-fopenmp' and then ran the code with cppyy_backend.loader, and then finally added the CLING_STANDARD_PCH environment variable with another export.

C++ Executable compiled with g++-11 easy_example.cpp -fopenmp -O2 -I <path_to_Eigen_library here>

Thejasvi
  • 200
  • 1
  • 11

1 Answers1

1

The problem was linking the OpenMP library file to the Cling compiler that runs cppyy.

I encountered the same problem in Linux Mint and Windows 11 too - which made me realise it is not an OS-specific problem. What ended up working was the following:

  1. Add the -fopenmp flag to your EXTRA_CLING_ARGS environmental variable (export EXTRA_CLING_ARGS='-fopenmp' in Unix, and in Windows go through the start menu and add a new environmental variable). Compile a new precompiled header with the code in the docs here, and define the path to the precompiled header with
  2. Find the libiomp5 library in your OS ($ locate libiomp5 in Unix, or search for libiomp5 in the File Explorer in Windows). The Unix file should be libiomp5.so and the Windows version is libiomp5md.dll
  3. In your Python module add the libiomp5 library with the cppyy.load_library(<insert path to .so or .dll file here>)

You should now have parallelised code!

This gets you to working code, but is admittedly a rather manual approach - would be happy to hear a more automated approach.

Thejasvi
  • 200
  • 1
  • 11
  • 1
    Thanks! I've been out, so not able to answer. Two notes, though: 1) if you set `EXTRA_CLING_ARGS`, there will be no default flags so recommend to add those as well, i.e. EXTRA_CLING_ARGS='-fopenmp -O2 -g'. These are flags to Cling, so should work on all platforms. 2) the PCH should rebuild automatically based on the presence of `-fopenmp` (you will have two PCHs at that point). – Wim Lavrijsen Sep 29 '22 at 00:49