2

I am involved in a project recently which has been written in python. All files are getting analyzed serially one after other, so taking approx 1 hour in completion. My task is to reduce this time.

I am not sure what should I use in python : Multi-thread or multi-process to run file analysis in parallel.

Please suggest.

Thank you.

  • What type of analysis is being performed? Also, are you looking to process multiple files in parallel or multiple parts of each file in parallel. – Tankobot Nov 18 '16 at 07:18
  • I am trying to run multiple files parallel so that I can reduce completion time, after some research on Google I get to know that I should use "popen" in python to do so. I am trying to get grip in this area by running some examples. :) – Shraddha Agrawal Nov 21 '16 at 06:22

2 Answers2

3

Maybe this is what you're looking for. https://nathangrigg.com/2015/04/python-threading-vs-processes

possible answer

Do you also need a brief description of these two & differences?

To be short, if one of your sub-problems has to wait another finishes, Multi Thread is good (for example, I/O heavy operations); by contrast, if your sub-problems could really happen at the same time, Multi Processing is suggested. However, you won't create more processes than your number of cores. maybe this will explain you better

Community
  • 1
  • 1
Shubham R
  • 7,382
  • 18
  • 53
  • 119
2

It heavily depends on the type of analysis. Here is a simple rule of thumb to give hints:

  • if the process is memory bound, keep it serial
  • if it is io bound, use multithreading - optimal number of threads depends on the percentage of time spent in io waiting
  • if it is cpu-bound, use muti-processing with a number or process equal to the number of available cores

If you cannot be sure a priori, just experiment... But never forget that no method is absolutely better that the others in all possible use cases

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252