3

I have the following simple Cython function for a parallel reduction:

# cython: boundscheck = False
# cython: initializedcheck = False
# cython: wraparound = False
# cython: cdivision = True
# cython: language_level = 3

from cython.parallel import parallel, prange

cpdef double simple_reduction(int n, int num_threads):
    cdef int i
    cdef int sum = 0

    for i in prange(n, nogil=True, num_threads=num_threads):
        sum += 1
    return sum

Which horrifyingly returns the following:

In [3]: simple_reduction(n=10, num_threads=1)                                                                                                              
Out[3]: 10.0

In [4]: simple_reduction(n=10, num_threads=2)                                                                                                              
Out[4]: 20.0

In [5]: simple_reduction(n=10, num_threads=3)                                                                                                              
Out[5]: 30.0

In other words, it appears to be repeating all n iterates of the loop for each thread instead of parallelizing the iterates over each thread. Any idea what's going?

I am using Python 3.7.1 and Cython 0.29.2 on macOS Mojave 10.14.3.

UPDATE: Here's my setup.py file:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
from Cython.Build import cythonize

import os
import sys

if sys.platform == 'darwin':
    os.environ['CC'] = 'gcc-8'
    os.environ['CXX'] = 'g++-8'

EXT_MODULES = [Extension('foo', ['foo.pyx'],
               extra_compile_args=['-fopenmp'],
               extra_link_args=['-fopenmp'])]

setup(name='foo',
      ext_modules=cythonize(EXT_MODULES))

I have installed GCC separately and have to set the environment variables 'CC' and 'CXX' when using OSX to avoid the problem of OSX aliasing those clang.

aschein
  • 41
  • 3
  • Your example looks to work correctly to me - I always get 10. It's possible that this relates to how you're compiling it (so edit the question to show your setup script, or whatever alternative you're using) – DavidW Feb 20 '19 at 07:27
  • @DavidW: updated with my setup.py and a note about GCC. – aschein Feb 20 '19 at 20:16
  • There's nothing obviously wrong there. My next suggestions would be to look at the .c file Cython generates. Just see if it has `reduction(+:__pyx_v_sum)` in it (it should!). After that I don't know - I don't have OSX myself so can't reproduce it, but hopefully someone else can help. – DavidW Feb 20 '19 at 21:40
  • @DavidW: thanks for your help. I looked in the .c and found: #pragma omp parallel reduction(+:__pyx_v_sum) num_threads(__pyx_v_num_threads) – aschein Feb 21 '19 at 16:34
  • Good - that suggests that Cython is generating sensible code. If you can I'd test the equivalent code in C - if that works wrongly then it's a GCC problem. I don't know how easy that is for you. – DavidW Feb 21 '19 at 16:50
  • @DavidW: thanks again. I figured out that the problem is related to macOS 10.14 and XCode. after installing gcc using anaconda, I used this solution to fix my problem: https://stackoverflow.com/questions/52509602/cant-compile-c-program-on-a-mac-after-upgrade-to-mojave – aschein Feb 21 '19 at 17:33
  • Glad to hear it. If you think it would be helpful to others it's fine to write an answer to your own question (if you'd just be copying the linked solution then it probably isn't worth it) – DavidW Feb 21 '19 at 18:04
  • @DavidW added an answer! – aschein Feb 21 '19 at 18:32

1 Answers1

1

I fixed this bug by first installing gcc using Anaconda:

conda install gcc

Then changing the lines in setup.py to use that new compiler:

if sys.platform == 'darwin':
    os.environ['CC'] = '/anaconda3/bin/gcc'
    os.environ['CXX'] = '/anaconda3/bin/g++'

Using Anaconda gcc (instead of the brew-installed one I was using originally) didn't fix the problem right away. It wouldn't compile due to the following bug:

/anaconda3/envs/python36/lib/gcc/x86_64-apple-darwin11.4.2/4.8.5/include-fixed/limits.h:168:61: fatal error: limits.h: No such file or directory #include_next /* recurse down to the real one */

The problem here has to due with macOS 10.14 and XCode 10.0. However the solution given by @Maxxx in this related question worked for me. After installing the .pkg hidden in the command line tool directory

/Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg

the code compiled and the parallelism worked as it was supposed to.

UPDATE: After updating to OSX Catalina, this fix no longer works because the .pkg file above no longer exists. I found a new solution from reading this related question. In my case, exporting the following path to CPATH fixed the problem.

export CPATH=~/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include
aschein
  • 41
  • 3
  • 2
    Just got the same problem. I installed gcc using brew (tried both gcc-8 and gcc-9). The solution you propose works, but I would be interested to understand more in detail where is the problem. It is very scary for me to have something like this in gcc v8 or v9 and not on the anaconda gcc v4.8.5. – Marco Lombardi Aug 12 '19 at 18:20
  • I am in the same situation. Cannot figure out why is this happening. Is this a gcc problem ? OpenMP ? – lasofivec Sep 11 '19 at 12:28