Python multiprocessing.Pool limit on matrix size in passed arguments

Question

I've recently discovered that the apply_async method in multiprocessing.Pool has a limit on the size of the arguments passed in args=(). Is that really the case or am I doing something wrong?

I've attached an example code where I simply print the shape of the matrix in the called functions. As you can see, at size 21000 the function is no longer being called (and no exception or warning are being raised :\ )

import numpy as np
from multiprocessing import Pool

def _mp_print_size(mat):
    try:
        print 'Mat size [%d %d]' % (mat.shape[0], mat.shape[1])
    except Warning as w:
        print w
    except Exception as e:
        print e

def diff_sizes(size):
    print 'Running with size ', size
    mat = np.ones([size, size])

    pool = Pool(2)
    for i in range(2):
        pool.apply_async(func=_mp_print_size, args=(mat, ))

    pool.close()
    pool.join()


if __name__ == '__main__':
    sizes = np.arange(1000, 1000000, 10000)
    for s in sizes:
        diff_sizes(s)

And the print out of this is:

Running with size  1000
Mat size [1000 1000]
Mat size [1000 1000]
Running with size  11000
Mat size [11000 11000]
Mat size [11000 11000]
Running with size  21000
Running with size  31000
Running with size  41000
Running with size  51000
Running with size  61000

I'm sorry if this questions was asked - linked to previous instances of it will be much appreciated as I could not find it myself.

Maybe relevant: http://stackoverflow.com/questions/10028809/maximum-size-for-multiprocessing-queue-item — ivan_pozdeev, Dec 20 '15 at 22:52
A 2100x2100 matrix is rather large at 441,000,00. It is pickled to be passed to the pool and that'll end up being in the gigabytes. I got a MemoryError when I tried to pickle the array myself. It leaves me wondering whether multiprocessing silently ignores the error. With large objects like that you may want to use shared memory. — tdelaney, Dec 20 '15 at 23:05
Are you running on linux? If so, `mat` already exists in the child process space so you don't have to send it. — tdelaney, Dec 20 '15 at 23:10
tdelaney - I know right?! The whole multithread/multiprocess in Python is so lacking, found out most things by trial an error :\ I am running it on linux, could you elaborate more on that part? I can't just not pass it to the function and expect it to work, it doesn't compile. — Lichman, Dec 21 '15 at 01:03

score 0 · Answer 1 · answered Dec 20 '15 at 23:01

Looks like a manifestation of http://bugs.python.org/issue8426 since multiprocessing.Pool.apply/map uses a Queue behind the scenes.

I would search and add debug printing to relevant calls in multiprocessing to check if the same things and with arguments of the same size as in the bug's discussion are being called.

If that's the case, this would warrant creating another bug to make multiprocessing check the payload size and split the transfer into chunks.

Python multiprocessing.Pool limit on matrix size in passed arguments

1 Answers1