I have some C++ code that I'd like to accelerate using OpenACC. I call the C++ from python using cython, but when I add OpenACC pragmas it doesn't seem to work. I tried a simple program:
#include <stdio.h>
#define N 2000000000
#define vl 1024
int main(void) {
double pi = 0.0f;
long long i;
#pragma acc parallel vector_length(vl)
#pragma acc loop reduction(+:pi)
for (i=0; i<N; i++) {
double t= (double)((i+0.5)/N);
pi +=4.0/(1.0+t*t);
}
printf("pi=%11.10f\n",pi/N);
return 0;
}
When compiling this with gcc pi.c -fopenacc -foffload=nvptx-none -foffload="-O3" -O3 -o gpu.x
it works perfectly.
I then tried wrapping this in cython with a very simple .pyx
file:
cdef extern from "pi.c":
int main()
def func():
return main()
My setup.py
file looks like this:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
extensions = [Extension(name="cpi",
sources=["cpi.pyx"],
extra_compile_args=["-O3", "-fopenacc", "-foffload=nvptx-none"],
extra_link_args=["-lgomp", "-fno-lto"],
language="c")]
setup(ext_modules = cythonize(extensions))
Building with python setup.py build_ext --inplace
works but when I try to call the function from python I get an error: "libgomp: target function wasn't mapped"
When I try building without the -fno-lto
flag I get the error "ImportError: ....so: undefined symbol: __offload_func_table" after calling it in python.
I haven't found anything similar on the internet...