115

My work should use parallel techniques, and I am new user of Python. I wonder if you could share some material about the Python multiprocessing and subprocess modules. What is the difference between these two?

NoDataDumpNoContribution
  • 10,591
  • 9
  • 64
  • 104
Jun HU
  • 3,176
  • 6
  • 18
  • 21
  • Potentially also look at `greenlets` - but avoid those until you've understood the answers given to your OP – Jon Clements Nov 28 '12 at 14:12
  • Possible duplicate of [deciding among subprocess, multiprocessing, and thread in Python?](http://stackoverflow.com/questions/2629680/deciding-among-subprocess-multiprocessing-and-thread-in-python) – wombatonfire Sep 10 '16 at 20:25

3 Answers3

148

The subprocess module lets you run and control other programs. Anything you can start with the command line on the computer, can be run and controlled with this module. Use this to integrate external programs into your Python code.

The multiprocessing module lets you divide tasks written in python over multiple processes to help improve performance. It provides an API very similar to the threading module; it provides methods to share data across the processes it creates, and makes the task of managing multiple processes to run Python code (much) easier. In other words, multiprocessing lets you take advantage of multiple processes to get your tasks done faster by executing code in parallel.

glglgl
  • 89,107
  • 13
  • 149
  • 217
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 14
    @glglgl: I am sorry I caused you pain. Thanks for cleaning that up. :-) Lets see if I can keep my apostrophes in check! – Martijn Pieters Nov 28 '12 at 14:24
  • 1
    "*The subprocess module lets you run and control other programs.*", sure, but does it use a thread or a process to run them? That's the interesting detail. Seems it uses a thread to create a new process asynchronously, and the thread continues to communicate with the process. – mins Jun 05 '21 at 09:35
  • 3
    @mins: the new process is run as a fork of the current process, which is then replaced by the child process via an `execv` or `execve` system call. No threads are involved nor would they help in creating the new process any faster. Communication is mostly handled by pipes (so, the OS). If you want to use threads, you can, or you could use asyncio. Your choice. – Martijn Pieters Jun 05 '21 at 13:48
  • Thanks, I guess fork/exec is equivalent to Windows [spawn](https://en.wikipedia.org/wiki/Spawn_(computing)). – mins Jun 05 '21 at 14:20
50

If you want to call an external program (especially one not written in Python) use subprocess.

If you want to call a Python function in a subprocess, use multiprocessing.

(If the program is written in Python, but is also importable, then I would try to call its functions using multiprocessing, rather than calling it externally through subprocess.)

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 3
    What if it is an external program and I want to run it on multiple processors? – Mooncrater Dec 17 '18 at 19:44
  • Do you want to run the external program multiple times, or do you want to run the external program as a single process in such a way as to take advantage of multiple processors? – unutbu Dec 17 '18 at 19:48
  • Multiple times, parallely. To be precise, OCRing multiple image files. – Mooncrater Dec 17 '18 at 19:50
  • 2
    You can launch multiple [non-blocking `subprocess.call`s](https://stackoverflow.com/q/16071866/190597). Any modern OS should run the processes concurrently over the available processors. You could also launch the [subprocesses from a thread pool](https://stackoverflow.com/a/26783779/190597) to limit the number of subprocesses launched concurrently. – unutbu Dec 17 '18 at 19:59
  • Can you please look at [this](https://stackoverflow.com/questions/53838992/multiple-subprocesses-take-a-lot-of-time-to-complete)? – Mooncrater Dec 18 '18 at 18:25
23

Subprocess spawns new processes, but aside from stdin/stdout and whatever other APIs the other program may implement you have no means to communicate with them. Its main purpose is to launch processes that are completely separate from your own program.

Multiprocessing also spawns new processes, but they run your code, and are designed to communicate with each other. You use it to divide tasks within your own program across multiple CPU cores.

DNS
  • 37,249
  • 18
  • 95
  • 132
  • "*Subprocess spawns new processes [...] Multiprocessing also spawns new processes, but they run your code*", what prevents the subprocess module to launch a process running your code? "*and are designed to communicate with each other*", if processes can communicate, what prevents the ones spawned by the subprocess module to communicate too? – mins Jun 05 '21 at 09:41
  • 1
    @mins: Processes spawned via `subprocess` can't (easily if at all) take advantage of the ways process can communicate that are started using `multiprocessing.Process`, such as via a [`multiprocessing.Manager`](https://docs.python.org/3/library/multiprocessing.html#managers). – martineau Nov 18 '21 at 00:23