4

I'm trying to parallelize some Python code using processes and concurrent.futures. It looks like I can execute a function multiple times in parrallel either by submitting calls and then calling Future.result() on the futures, or by using Executor.map().
I'm wondering if the latter is just a syntactic sugar for the former and if there's any difference performance-wise. It doesn't seem immediately clear from the documentation.

planetp
  • 14,248
  • 20
  • 86
  • 160

1 Answers1

4

It will allow you to execute a function multiple times concurrently instead true parallel execution.

Performance wise, I recently found that the ProcessPoolExecutor.submit() and ProcessPoolExecutor.map() consumed the same amount of compute time to complete the same task. Note: .submit() returns a future object (let's call it f) and you need to use it's f.result option to see it's result. On the other hand, .map() directly returns an iterator.

When converting their results into an ordered list using the sorted method, I have found that compute time of the entire .map()code can be faster than entire .submit() code in certain scenarios.

When converting their results into an unordered list using the list method, the compute time of the entire .submit() and .map() codes are the same. Also, these codes performed faster than the codes using the sorted method.

You can read the details in my answer. There, I have also shared my codes where you can see how they work. I hope they can be helpful to you.

I have not used ThreadPoolExecutor so I can't comment in detail. However, I have read that they are implemented the same way as the ProcessPoolExecutor and they are more suited to be used for I/O bound tasks instead of CPU bound tasks. You do need to specify the max_workers argument, i.e. the max number of threads, whereas in the ProcessPoolExecutor max_workers is an optional argument which defaults to the number of CPUs returned by os.cpu_count().

Community
  • 1
  • 1
Sun Bear
  • 7,594
  • 11
  • 56
  • 102