1

I'm trying to apply Multiprocessing in my code and I ran into this example:

import multiprocessing
from itertools import product

def merge_names(a, b):
    return '{} & {}'.format(a, b)

if __name__ == '__main__':
    names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
    with multiprocessing.Pool(processes=3) as pool:
        results = pool.starmap(merge_names, product(names, repeat=2))
    print(results)

This should take no more than a few seconds, but when I ran it in Jupyter Notebook it does not end, I have to reset the kernel for that. Any special issues with Jupyter or Anaconda in using Multiprocessing?

I'm using

conda version 4.8.4
ipython version 5.8.0
user88484
  • 1,249
  • 1
  • 13
  • 34
  • I am having the same issue with just pure python 3.8 console but I am getting this error: AttributeError: Can't get attribute 'merge_names' on – Vladimír Kunc Aug 28 '20 at 12:18

2 Answers2

1

This not really an answer but since comments cannot nicely format code, I'll put it here Your code does not work for me even in pure python 3.8 (installed through conda though) - I do not think it is connected to the jupyter or ipython.

This code works for me:

import multiprocessing
from itertools import product

names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
with multiprocessing.Pool(processes=3) as pool:
    results = pool.starmap('{} & {}'.format, product(names, repeat=2))
print(results)

thus is seems that there is some issue with pickling the custom function and sending it to the pool - I do not know the cause nor the solution for that.

But if you just need similar functionality, I recommend joblib

from joblib import Parallel, delayed
from itertools import product

def merge_names(a, b):
    return '{} & {}'.format(a, b)

names = ['Brown', 'Wilson', 'Bartlett', 'Rivera', 'Molloy', 'Opie']
result = Parallel(n_jobs=3, prefer="processes")(delayed(merge_names)(a, b) for a,b in product(names, repeat=2))
print(result)

The joblib has similar construction of pool of workers, which then can be used similarly as you want:

with Parallel(n_jobs=2) as parallel:
   ...
Vladimír Kunc
  • 379
  • 1
  • 4
  • Was is the this `product` that you use? from which package did you import it? – user88484 Aug 28 '20 at 12:40
  • @user88484 From the itertools package, as in the first example. I'll edit it so it is obvious - it is the same code as in the original question except for small changes. – Vladimír Kunc Aug 28 '20 at 12:42
  • Hi, also change the `results` at the end to `result` :) I'll mark your question as the answer, though I still need to learn how to apply it to my real, more complicated functions, it seems the right way to do multiprocessing in Python – user88484 Aug 28 '20 at 12:46
  • @user88484 I have edited it, thanks. The multiprocessing is one of the good ways to do it - I personally prefer the joblib library, because it allows easy switching between threads and processes and it is the best for processing data in parallel. For multiple heterogenous processes, the multiprocessing is the way to go. Both approaches are pythonic, the joblib actually used to built on top of multiprocessing (they switched the default backend though). – Vladimír Kunc Aug 28 '20 at 14:32
1

I've noticed this behaviour too. Everything seems to be working but it never finishes. Wrapping the variables in a tqdm progress bar shows the variables being loaded in but then nothing. It never ever finishes and Task Manager shows CPUs doing absolutely zero work.

Taking the function and putting it into a separate Python file then importing the function back in seems to work for me.

Berg
  • 36
  • 3