I have the following simple Cython function for a parallel reduction:
# cython: boundscheck = False
# cython: initializedcheck = False
# cython: wraparound = False
# cython: cdivision = True
# cython: language_level = 3
from cython.parallel import parallel, prange
cpdef double simple_reduction(int n, int num_threads):
cdef int i
cdef int sum = 0
for i in prange(n, nogil=True, num_threads=num_threads):
sum += 1
return sum
Which horrifyingly returns the following:
In [3]: simple_reduction(n=10, num_threads=1)
Out[3]: 10.0
In [4]: simple_reduction(n=10, num_threads=2)
Out[4]: 20.0
In [5]: simple_reduction(n=10, num_threads=3)
Out[5]: 30.0
In other words, it appears to be repeating all n iterates of the loop for each thread instead of parallelizing the iterates over each thread. Any idea what's going?
I am using Python 3.7.1 and Cython 0.29.2 on macOS Mojave 10.14.3.
UPDATE: Here's my setup.py file:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
from Cython.Build import cythonize
import os
import sys
if sys.platform == 'darwin':
os.environ['CC'] = 'gcc-8'
os.environ['CXX'] = 'g++-8'
EXT_MODULES = [Extension('foo', ['foo.pyx'],
extra_compile_args=['-fopenmp'],
extra_link_args=['-fopenmp'])]
setup(name='foo',
ext_modules=cythonize(EXT_MODULES))
I have installed GCC separately and have to set the environment variables 'CC' and 'CXX' when using OSX to avoid the problem of OSX aliasing those clang.