I am trying out multiprocessor programming with Python. Take a divide and conquer algorithm like Fibonacci
for example. The program flow of execution would branch out like a tree and execute in parallel. In other words, we have an example of nested parallelism.
From Java, I have used a threadpool pattern to manage resources, since the program could branch out very quickly and create too many short-lived threads. A single static (shared) threadpool can be instantiated via ExecutorService
.
I would expect the same for Pool, but it appears that Pool object is not to be globally shared. For example, sharing the Pool using multiprocessing.Manager.Namespace()
will lead to the error.
pool objects cannot be passed between processes or pickled
I have a 2-part question:
- What am I missing here; why shouldn't a Pool be shared between processes?
- What is a pattern for implementing nested parallelism in Python? If possible, maintaining a recursive structure, and not trading it for iteration.
from concurrent.futures import ThreadPoolExecutor
def fibonacci(n):
if n < 2:
return n
a = pool.submit(fibonacci, n - 1)
b = pool.submit(fibonacci, n - 2)
return a.result() + b.result()
def main():
global pool
N = int(10)
with ThreadPoolExecutor(2**N) as pool:
print(fibonacci(N))
main()
Java
public class FibTask implements Callable<Integer> {
public static ExecutorService pool = Executors.newCachedThreadPool();
int arg;
public FibTask(int n) {
this.arg= n;
}
@Override
public Integer call() throws Exception {
if (this.arg > 2) {
Future<Integer> left = pool.submit(new FibTask(arg - 1));
Future<Integer> right = pool.submit(new FibTask(arg - 2));
return left.get() + right.get();
} else {
return 1;
}
}
public static void main(String[] args) throws Exception {
Integer n = 14;
Callable<Integer> task = new FibTask(n);
Future<Integer> result =FibTask.pool.submit(task);
System.out.println(Integer.toString(result.get()));
FibTask.pool.shutdown();
}
}
I'm not sure if it matters here, but I am ignoring the difference between "process" and "thread"; to me they both mean "virtualized processor". My understanding is, the purpose of a Pool is for sharing of a "pool" or resources. Running tasks can make a request to the Pool. As parallel tasks complete on other threads, those threads can be reclaimed and assigned to new tasks. It doesn't make sense to me to disallow sharing of the pool, so that each thread must instantiate its own new pool, since that would seem to defeat the purpose of a thread pool.