How multiprocessing
works, in a nutshell:
Process()
spawns (fork
or similar on Unix-like systems) a copy of the original program (on Windows, which lacks a real fork
, this is tricky and requires the special care that the module documentation notes).
- The copy communicates with the original to figure out that (a) it's a copy and (b) it should go off and invoke the
target=
function (see below).
- At this point, the original and copy are now different and independent, and can run simultaneously.
Since these are independent processes, they now have independent Global Interpreter Locks (in CPython) so both can use up to 100% of a CPU on a multi-cpu box, as long as they don't contend for other lower-level (OS) resources. That's the "multiprocessing" part.
Of course, at some point you have to send data back and forth between these supposedly-independent processes, e.g., to send results from one (or many) worker process(es) back to a "main" process. (There is the occasional exception where everyone's completely independent, but it's rare ... plus there's the whole start-up sequence itself, kicked off by p.start()
.) So each created Process
instance—p
, in the above example—has a communications channel to its parent creator and vice versa (it's a symmetric connection). The multiprocessing
module uses the pickle
module to turn data into strings—the same strings you can stash in files with pickle.dump
—and sends the data across the channel, "downwards" to workers to send arguments and such, and "upwards" from workers to send back results.
Eventually, once you're all done with getting results, the worker finishes (by returning from the target=
function) and tells the parent it's done. To make sure everything gets closed and cleaned-up, the parent should call p.join()
to wait for the worker's "I'm done" message (actually an OS-level exit
on Unix-ish sysems).
The example is a little bit silly since the two printed messages take basically no time at all, so running them "at the same time" has no measurable gain. But suppose instead of just printing hello
, f
were to calculate the first 100,000 digits of π (3.14159...). You could then spawn another Process
, p2
with a different target g
that calculates the first 100,000 digits of e (2.71828...). These would run independently. The parent could then call p.join()
and p2.join()
to wait for both to complete (or spawn yet more workers to do more work and occupy more CPUs, or even go off and do its own work for a while first).