5

I have two dense matrices with the sizes (2500, 208) and (208, 2500). I want to calculate their product. It works fine and fast when it is a single process but when it is in a multiprocessing block, the processes stuck in there for hours. I do sparse matrices multiplication with even larger sizes but I have no problem. My code looks like this:

with Pool(processes=agents) as pool:
    result = pool.starmap(run_func, args)
def run_func(args):
    #Do stuff. Including large sparse matrices multiplication. 
    C = np.matmul(A,B) # or A.dot(B) or even using BLASS library directly dgemm(1, A, B)
    #Never go after the line above!

Note that when the function run_func is executed in a single process, then it works fine. When I do multiprocessing on my local machine, it works fine. When I go for a multiprocessing on HPC, it stucks. I allocate my resources like this:

srun -v --nodes=1 --time 7-0:0 --cpus-per-task=2 --nodes=1 --mem-per-cpu=20G python3 -u run.py 2

Where the last parameter is the number of agents in the code above. Here is the LAPACK library details supported on the HPC (obtained from numpy):

    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['**/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['**/include']
blas_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['**lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['**/include']
lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['**/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['**/include']
lapack_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['**/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['**/include']

Compared to my local machine, all python packages and python version on HPC are the same. Any leads on what is going on?

rando
  • 365
  • 3
  • 12

2 Answers2

1

As a workaround, I tried multithreading instead of multiprocessing and the issue is resolved now. I am not sure what the problem behind multiprocessing though.

rando
  • 365
  • 3
  • 12
  • 5
    Please provide a detailed answer for self-answered questions as well so that others with a similar issue can be assisted. Thank you. – Akshay Sehgal Dec 21 '20 at 06:16
1

When using multiprocessing, the variables are not shared among processes since multiple processes do not run in the same memory space. However, when using threading, threads run in the same memory space. That is why your solution with threading worked.

Here you need to decide if you need multiprocessing or threading. threading will be a more straight-forward solution with no additional tricks to share objects, just like your second solution. However, Global Interpreter Lock (GIL) of Python might be a performance bottleneck since only one thread can take hold of the Python interpreter at a time.

On the other hand, multiprocessing can provide you multiple cores and CPUs and you can also avoid the Global Interpreter Lock (GIL) of Python. If you choose multiprocessing, I would suggest using the Manager and Value classes from the multiprocessing module and with those classes you can still share objects among different processes and solve your problem. In this answer you can find a brief summary on these classes.

aargun
  • 126
  • 4