7

I am running Mac OS X 10.6.8 and am using the Enthought Python Distribution. I want for numpy functions to take advantage of both my cores. I am having a problem similar to that of this post: multithreaded blas in python/numpy but after following through the steps of that poster, I still have the same problem. Here is my numpy.show_config():

lapack_opt_info:
    libraries = ['mkl_lapack95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'mkl_mc', 'mkl_mc3', 'pthread']
    library_dirs = ['/Library/Frameworks/EPD64.framework/Versions/1.4.2/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/Library/Frameworks/EPD64.framework/Versions/1.4.2/include']
blas_opt_info:
    libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'mkl_mc', 'mkl_mc3', 'pthread']
    library_dirs = ['/Library/Frameworks/EPD64.framework/Versions/1.4.2/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/Library/Frameworks/EPD64.framework/Versions/1.4.2/include']
lapack_mkl_info:
    libraries = ['mkl_lapack95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'mkl_mc', 'mkl_mc3', 'pthread']
    library_dirs = ['/Library/Frameworks/EPD64.framework/Versions/1.4.2/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/Library/Frameworks/EPD64.framework/Versions/1.4.2/include']
blas_mkl_info:
    libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'mkl_mc', 'mkl_mc3', 'pthread']
    library_dirs = ['/Library/Frameworks/EPD64.framework/Versions/1.4.2/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/Library/Frameworks/EPD64.framework/Versions/1.4.2/include']
mkl_info:
    libraries = ['mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'mkl_mc', 'mkl_mc3', 'pthread']
    library_dirs = ['/Library/Frameworks/EPD64.framework/Versions/1.4.2/lib']
    define_macros = [('SCIPY_MKL_H', None)]
    include_dirs = ['/Library/Frameworks/EPD64.framework/Versions/1.4.2/include']

As in the original post's comments, I deleted the line that set the variable MKL_NUM_THREADS=1. But even then the numpy and scipy functions that should take advantage of multi-threading are only using one of my cores at a time. Is there something else I should change?

Edit: To clarify, I am trying to get one single calculation such as numpy.dot() to use multi-threading on its own as per the MKL implementation, I am not trying to take advantage of the fact that numpy calculations release control of the GIL, hence making multi-threading with other functions easier.

Here is a small script that should make use of multi-threading but does not on my machine:

import numpy as np

a = np.random.randn(1000, 10000)
b = np.random.randn(10000, 1000)

np.dot(a, b) #this line should be multi-threaded
Community
  • 1
  • 1
Nino
  • 411
  • 4
  • 15
  • I've just tried: `python -mtimeit -s'import numpy as np; a = np.random.randn(1e3,1e3)' 'np.dot(a, a)'` It uses multiple cores. So at least in some configuration it can do it. – jfs Aug 03 '12 at 05:28
  • @J.F.Sebastian I am aware that it can, but I am trying to figure out what I am missing. – Nino Aug 03 '12 at 19:23
  • @J.F.Sebastian I just tried doing exactly what you did and got full use of my cores. The reason is that numpy calculations let go of the GIL, so that running several different calculations in the form of a for loop (as is done by timeit), each calculation is done in a different thread. What I am having trouble with, however, is the multi-threading of one calculation on its own. If I simply run a script similar to yours without using timeit (therefore no iterations), only one core is used at a time. – Nino Aug 07 '12 at 00:33
  • @Nino: `timeit` executes `np.dot()` sequentially. It is a synchronious operattion, the next one doesn't start until the previous ends. All parallelism is inside `np.dot()`. – jfs Aug 07 '12 at 10:18
  • @J.F.Sebastian I can't explain it but it works now. Thanks for the help. – Nino Aug 08 '12 at 00:29

1 Answers1

7

This article seems to imply that numpy intelligently makes certain operations parallel, depending on predicted speedup of the operation:

  • "If your numpy/scipy is compiled using one of these, then dot() will be computed in parallel (if this is faster) without you doing anything. "

Perhaps your small(-ish) test case won't show significant speedup according to numpy's heuristic for determining when to parallelize a particular dot() call? Maybe try a ridiculously large operation and see if both cores are utilized?

As a side note, does your processor/machine configuration actually support BLAS?

fabian789
  • 8,348
  • 4
  • 45
  • 91
chisaipete
  • 884
  • 9
  • 20
  • It does support BLAS, but that is out of the question because my numpy is linked to MKL. So, strangely enough, just out of frustration, I tried running that script above again, and now it works. Very perplexing... But now I am good to go, and since you answered, 50 points to you. Thanks. – Nino Aug 07 '12 at 20:49