Assume yo = Yo()
is a big object with a method double
, which returns its parameter multiplied by 2
.
If I pass yo.double
to imap
of multiprocessing
, then it is incredibly slow, because every function call creates a copy of yo
I think.
Ie, this is very slow:
from tqdm import tqdm
from multiprocessing import Pool
import numpy as np
class Yo:
def __init__(self):
self.a = np.random.random((10000000, 10))
def double(self, x):
return 2 * x
yo = Yo()
with Pool(4) as p:
for _ in tqdm(p.imap(yo.double, np.arange(1000))):
pass
Output:
0it [00:00, ?it/s]
1it [00:06, 6.54s/it]
2it [00:11, 6.17s/it]
3it [00:16, 5.60s/it]
4it [00:20, 5.13s/it]
...
BUT, if I wrap yo.double
with a function double_wrap
and pass it to imap
, then it is essentially instantaneous.
def double_wrap(x):
return yo.double(x)
with Pool(4) as p:
for _ in tqdm(p.imap(double_wrap, np.arange(1000))):
pass
Output:
0it [00:00, ?it/s]
1000it [00:00, 14919.34it/s]
How and why does wrapping the function change the behavior?
I use Python 3.6.6.