13

I am running into a bizarre problem that I can't explain. I'm hoping someone out there can help please!

I'm running Python 2.7.3 and Scipy v0.14.0 and am trying to implement some very simple multiprocessor algorithms to speeds up my code using the module multiprocessing. I've managed to make a basic example work:

import multiprocessing
import numpy as np
import time
# import scipy.special


def compute_something(t):
    a = 0.
    for i in range(100000):
        a = np.sqrt(t)
    return a

if __name__ == '__main__':

    pool_size = multiprocessing.cpu_count()
    print "Pool size:", pool_size
    pool = multiprocessing.Pool(processes=pool_size)

    inputs = range(10)

    tic = time.time()
    builtin_outputs = map(compute_something, inputs)
    print 'Built-in:', time.time() - tic

    tic = time.time()
    pool_outputs = pool.map(compute_something, inputs)
    print 'Pool    :', time.time() - tic

This runs fine, returning

Pool size: 8
Built-in: 1.56904006004
Pool    : 0.447728157043

But if I uncomment the line import scipy.special, I get:

Pool size: 8
Built-in: 1.58968091011
Pool    : 1.59387993813

and I can see that only one core is doing the work on my system. In fact, importing any module from the scipy package seems to have this effect (I've tried several).

Any ideas? I've never seen a case like this before, where an apparently innocuous import can have such a strange and unexpected effect.

Thanks!

Update (1)

Moving the scipy import line to the function compute_something partially improves the problem:

Pool size: 8
Built-in: 1.66807389259
Pool    : 0.596321105957

Update (2)

Thanks to @larsmans for testing on a different system. Problem was not confirmed using Scipy v.0.12.0. Moving this query to the scipy mailing list and will post any answers.

Gabriel
  • 1,870
  • 1
  • 19
  • 20
  • Cannot reproduce with Python 2.7.5, SciPy 0.12.0. Which version are you using? – Fred Foo May 08 '14 at 09:59
  • Interesting, thanks for trying! I'm using 0.14.0b1. I need some of the more recent modules, hence using a more recent version. – Gabriel May 08 '14 at 10:02
  • I suggest trying the stable version as well -- and if that fixes the problem, try contacting the SciPy mailing list. Debugging a beta version of a library isn't really SO stuff. – Fred Foo May 08 '14 at 10:03
  • Good point, thanks for your help. In this case, is it best practice to 'answer' the question, or update with this info and leave open? – Gabriel May 08 '14 at 10:06
  • Problem still confirmed in stable version 0.14.0 – Gabriel May 08 '14 at 10:20
  • You can either delete it or leave it open and later answer it yourself if you can get the problem resolved using the SciPy ML. – Fred Foo May 08 '14 at 11:23
  • My guess is that one of the FORTRAN extension modules in `scipy.special` was compiled in such a way to use OpenMP, and that importing it caused the OpenMP runtime to set the CPU affinity for the parent process, and the child processes inherited that setting. When you come to the SciPy mailing list, please tell us how you built scipy in detail, what FORTRAN compiler you used, what platform you are on, etc. Thanks. – Robert Kern May 08 '14 at 13:28

1 Answers1

14

After much digging around and posting an issue on the Scipy GitHub site, I've found a solution.

Before I start, this is documented very well here - I'll just give an overview.

This problem is not related to the version of Scipy, or Numpy that I was using. It originates in the system BLAS libraries that Numpy and Scipy use for various linear algebra routines. You can tell which libraries Numpy is linked to by running

python -c 'import numpy; numpy.show_config()'

If you are using OpenBLAS in Linux, you may find that the CPU affinity is set to 1, meaning that once these algorithms are imported in Python (via Numpy/Scipy), you can access at most one core of the CPU. To test this, within a Python terminal run

import os
os.system('taskset -p %s' %os.getpid())

If the CPU affinity is returned as f, of ff, you can access multiple cores. In my case it would start like that, but upon importing numpy or scipy.any_module, it would switch to 1, hence my problem.

I've found two solutions:

Change CPU affinity

You can manually set the CPU affinity of the master process at the top of the main function so that the code looks like this:

import multiprocessing
import numpy as np
import math
import time
import os

def compute_something(t):
    a = 0.
    for i in range(10000000):
        a = math.sqrt(t)
    return a

if __name__ == '__main__':

    pool_size = multiprocessing.cpu_count()
    os.system('taskset -cp 0-%d %s' % (pool_size, os.getpid()))

    print "Pool size:", pool_size
    pool = multiprocessing.Pool(processes=pool_size)

    inputs = range(10)

    tic = time.time()
    builtin_outputs = map(compute_something, inputs)
    print 'Built-in:', time.time() - tic

    tic = time.time()
    pool_outputs = pool.map(compute_something, inputs)
    print 'Pool    :', time.time() - tic

Note that selecting a value higher than the number of cores for taskset doesn't seem to matter - it just uses the maximum possible number.

Switch BLAS libraries

Solution documented at the site linked above. Basically: install libatlas and run update-alternatives to point numpy to ATLAS rather than OpenBLAS.

Gabriel
  • 1,870
  • 1
  • 19
  • 20