2

I want to compute the numerical gradient of a function in parallel. However, I realized that evaluating my function in a multiprocessing pool gives me a different result than before. It seems like the matrix multiplication is different. I reconstructed the problem in the code below. For a smaller matrix size like (5,2) the results are equivalent. I am using Python 3.7.4 and Numpy 1.18.5. Is there a way to get equivalent results also for larger matrices?

import numpy as np
import multiprocessing as mp
import os

def setNumSubLibraryThreads(n):
    os.environ["OMP_NUM_THREADS"] = str(n)
    os.environ["OPENBLAS_NUM_THREADS"] = str(n)
    os.environ["MKL_NUM_THREADS"] = str(n)
    os.environ["VECLIB_MAXIMUM_THREADS"] = str(n)
    os.environ["NUMEXPR_NUM_THREADS"] = str(n)
    
def solveInParallel(f, args):
    cpu_count = mp.cpu_count()
    setNumSubLibraryThreads(1)
    p = mp.Pool(cpu_count)
    sol = np.array(p.map(f,args))
    p.close()
    setNumSubLibraryThreads(cpu_count)
    return sol

def g(args):
    mat1, mat2 = args
    return mat1.T.dot(mat2)

if __name__ == '__main__':
    np.random.seed(42)
    mat1 = np.random.random((1000,20))
    mat2 = np.random.random((1000,20))
    ref = mat1.T.dot(mat2)
    res_par_mat = solveInParallel(g, [(mat1, mat2)])
    print(res_par_mat[0] == ref)
np.__config__.show() gives                                                                                                                            
blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['...']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = [...]
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = [...]
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = [...]
    language = c
    define_macros = [('HAVE_CBLAS', None)]
S14p
  • 51
  • 1
  • 7
  • Could you please post the result of `np.__config__.show()`. In my python which is not linked against any BLAS library, the results match. Also, could you try comparing the arrays with `np.allclose` instead of `==` just to see if the results on your side are at least numerically close – mandulaj Jan 12 '21 at 18:09
  • 1
    `ref-res_par_mat` seems to be essentially round-off errors, maximum diff being `1.1368683772161603e-13` on my pc. – Yacola Jan 12 '21 at 18:10
  • Also, note that you can only set the various `NUM_THREADS` variables before the `import numpy`. Later they have no effect. So you can't change the `NUM_THREADS` in this way. See https://stackoverflow.com/questions/30791550/limit-number-of-threads-in-numpy and checkout https://pypi.org/project/threadpoolctl/ for Thread-pool controls – mandulaj Jan 12 '21 at 18:13
  • Comparing the results with np.allclose() returns True – S14p Jan 12 '21 at 18:14
  • @mandulaj I added the config above thanks for your input. I will check that. – S14p Jan 12 '21 at 18:28
  • In that case it's just as @Yacola said just a round-off error. – mandulaj Jan 12 '21 at 18:34

1 Answers1

0

Seems like I have to live with that round-off error.

import numpy as np
import multiprocessing as mp
from threadpoolctl import threadpool_limits

    
def solveInParallel(f, args):
    cpu_count = mp.cpu_count()
    with threadpool_limits(limits=1, user_api='blas'):
        p = mp.Pool(cpu_count)
        sol = np.array(p.map(f,args))
        p.close()
    return sol

def g2(args):
    mat1, mat2 = args
    return mat1.T.dot(mat2)

if __name__ == '__main__':
    np.random.seed(42)
    mat1 = np.random.random((1000,20))
    mat2 = np.random.random((1000,20))
    ref = mat1.T.dot(mat2)
    res = solveInParallel(g2, [(mat1, mat2)])
    print(res[0]== ref)
S14p
  • 51
  • 1
  • 7