1

Using multiprocessing.pool I can split an input list for a single function to be processed in parallel along multiple CPUs. Like this:

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    pool = Pool(processes=4)
    results = pool.map(f, range(100))
    pool.close()
    pool.join()    

However, this does not allow to run different functions on different processors. If I want to do something like this, in parallel / simultaneously:

foo1(args1) --> Processor1
foo2(args2) --> Processor2

How can this be done?

Edit: After Darkonaut remarks, I do not care about specifically assigning foo1 to Processor number 1. It can be any processor as chosen by the OS. I am just interested in running independent functions in different/ parallel Processes. So rather:

foo1(args1) --> process1
foo2(args2) --> process2
hirschme
  • 774
  • 2
  • 11
  • 40
  • See [Designate specific CPU for a process - python multiprocessing](https://stackoverflow.com/questions/36172101/designate-specific-cpu-for-a-process-python-multiprocessing) – Kos Jun 06 '19 at 16:20
  • @Kos does this really parallelize the code? Or does it just assigns which processor to use, but will still run the code sequentially..? – hirschme Jun 06 '19 at 16:30
  • I believe it does, see what happens if you remove `time.sleep(1 + 3 * worker)` – Kos Jun 06 '19 at 18:34
  • 1
    Also if I replace `p.cpu_affinity([worker])` by `p.cpu_affinity([worker // 2])` then I can see half of my cores idling. Looks like it does the trick. (I'm testing on Ubuntu.) – Kos Jun 06 '19 at 18:37
  • How many functions and how many cores do you have? Does a function execute more than once, with different arguments? Are you sure you want to map function execution directly to specific cores and not just processes? This level of micromanagement would be rather unusual and not necessary if you just care about parallel execution of different functions. – Darkonaut Jun 06 '19 at 19:10
  • @Darkonaut I just care about parallel execution of different functions (lets just assume 2 functions and two processors, being executed only once). I am not sure what do you mean with specific cores and not just processes, I never mentioned the word core. Could you explain what do you mean by that? – hirschme Jun 06 '19 at 19:15
  • 1
    A "processor" is hardware, a "process" is the software abstraction for an execution context. When you fiddle with cpu-affinity you're targeting hardware directly. Usually you would let the OS decide on which core (processor) to execute your processes. – Darkonaut Jun 06 '19 at 19:21
  • @Darkonaut right, I do only care about parallel processes, regardless if this means using different Processors/Cores (or different threads within a processor). The assignment of which piece of hardware does this I can leave to the OS – hirschme Jun 06 '19 at 19:24
  • I have such a solution [here](https://stackoverflow.com/a/52992065/9059420), you just have to switch the `ThreadPool` with the regular process-Pool, it's the same API. – Darkonaut Jun 06 '19 at 19:56

1 Answers1

2

I usually find it easiest to use the concurrent.futures module for concurrency. You can achieve the same with multiprocessing, but concurrent.futures has (IMO) a much nicer interface.

Your example would then be:

from concurrent.futures import ProcessPoolExecutor


def foo1(x):
    return x * x


def foo2(x):
    return x * x * x


if __name__ == '__main__':
    with ProcessPoolExecutor(2) as executor:
        # these return immediately and are executed in parallel, on separate processes
        future_1 = executor.submit(foo1, 1)
        future_2 = executor.submit(foo2, 2)
    # get results / re-raise exceptions that were thrown in workers
    result_1 = future_1.result()  # contains foo1(1)
    result_2 = future_2.result()  # contains foo2(2)

If you have many inputs, it is better to use executor.map with the chunksize argument instead:

from concurrent.futures import ProcessPoolExecutor


def foo1(x):
    return x * x


def foo2(x):
    return x * x * x


if __name__ == '__main__':
    with ProcessPoolExecutor(4) as executor:
        # these return immediately and are executed in parallel, on separate processes
        future_1 = executor.map(foo1, range(10000), chunksize=100)
        future_2 = executor.map(foo2, range(10000), chunksize=100)
    # executor.map returns an iterator which we have to consume to get the results
    result_1 = list(future_1)  # contains [foo1(x) for x in range(10000)]
    result_2 = list(future_2)  # contains [foo2(x) for x in range(10000)]

Note that the optimal values for chunksize, the number of processes, and whether process-based concurrency actually leads to increased performance depends on many factors:

  • The runtime of foo1 / foo2. If they are extremely cheap (as in this example), the communication overhead between processes might dominate the total runtime.

  • Spawning a process takes time, so the code inside with ProcessPoolExecutor needs to run long enough for this to amortize.

  • The actual number of physical processors in the machine you are running on.

  • Whether your application is IO bound or compute bound.

  • Whether the functions you use in foo are already parallelized (such as some np.linalg solvers, or scikit-learn estimators).

Dion
  • 1,492
  • 11
  • 14
  • this looks great and very easy to implement. However if I compare the timing against running both functions in sequence I get a x5 faster runtime in the later: I redefined `def foo1(x): return [a*a for a in x]` . Is this an error on my side or is this solution just not very efficient? – hirschme Jun 06 '19 at 19:53
  • 1
    Spawning a process (or 4 in this case) takes time, so your functions need to be somewhat expensive for this to pay off. There is no silver bullet here. If your application is data-parallel you can pass many arguments at once via `executor.map`. If you run foo multiple times during your application it usually makes sense to create the executor once and re-use that. We need to know more about your specific use case to help you with performance optimization. – Dion Jun 06 '19 at 19:56
  • See my edit for a more in-depth discussion on how and when you might want to use multiprocessing. – Dion Jun 06 '19 at 20:31
  • In which sense does IO- vs compute-bound application influence parallelization performance? I would have thought that either one should speed up when parallelizing? – hirschme Jun 06 '19 at 20:41
  • If your raw disk or network speed is the bottleneck spawning additional processes will grant you little speedup. There *are* situations where IO-bounds tasks greatly profit from concurrency - I'm just saying it depends. – Dion Jun 06 '19 at 20:44
  • one last question about your post. What does it mean that you spawn 4 processes for running two functions? Especially in the first example where each function only runs one operation. How is this split up into 4 processes? – hirschme Jun 06 '19 at 20:47
  • 1
    It is not split up into 4 processes. Spawning 4 processes for 2 functions is indeed nonsensical (edited). I took the 4 from your original question. If you use `executor.map` however, `concurrent.futures` does use all processes (it just parallelizes over the data). – Dion Jun 06 '19 at 20:48