why spawn method is much slower than fork method in python multiprocessing

Question

I was experimenting different starting methods in multiprocessing module and I found something weird. Changing the variable method from "spawn" to "fork", drops the execution time from 9.5 to just 0.5.

import multiprocessing as mp
from multiprocessing import Process, Value
from time import time


def increment_value(shared_integer):
    with shared_integer.get_lock():
        shared_integer.value += 1


if __name__ == "__main__":
    method = "spawn"
    mp.set_start_method(method)

    start = time()
    for _ in range(200):
        integer = Value("i", 0)
        procs = [
            Process(target=increment_value, args=(integer,)),
            Process(target=increment_value, args=(integer,)),
        ]

        for p in procs:
            p.start()
        for p in procs:
            p.join()

        assert integer.value == 2

    print(f"{method} - Finished in {time() - start:.4f} seconds")

outputs for different runs:

spawn - Finished in 9.4275 seconds
fork - Finished in 0.5316 seconds

I'm aware of how these two methods start a new child process(well-explained here), but this difference puts a big question mark in my head. I would like know exactly which part of the code impacts the performance mostly? Is that the pickling part in "spawn"? Does it have anything to do with the lock?

I'm running this code on Linux Pop!_OS and my interpreter version is 3.11.

Did you happen to see: [multiprocessing fork() vs spawn()](https://stackoverflow.com/questions/64095876/multiprocessing-fork-vs-spawn)? (duplicate?) — JonSG, Aug 15 '23 at 19:45
@JonSG Yes I did. So is it just the starting method that is much slower? I don't know why I didn't expect this number and I guessed there must be something else as well. — S.B, Aug 15 '23 at 19:48
I don't actually know myself. I just have read that post a few times in the past while helping people debug spawn/fork problems on windows. — JonSG, Aug 15 '23 at 19:49
Does this answer your question? [multiprocessing fork() vs spawn()](https://stackoverflow.com/questions/64095876/multiprocessing-fork-vs-spawn) — Dubious Denise, Aug 15 '23 at 19:50
Your processes are doing basically no work at all; your total time is dominated by the startup time of each process - which is indeed higher for spawn vs. fork. You wouldn't see this magnitude of difference in any realistic use of multiprocessing. — jasonharper, Aug 15 '23 at 19:52

score 1 · Accepted Answer · answered Aug 15 '23 at 19:50

1

The fork method copies all resources from a process and continues from that point, whereas the spawn method creates a new instance of the Python interpreter and recreates that point (by "point" I mean the state of the process: what resources it has, etc).

As you can imagine, simply copying some data has a lot less overhead than creating an entirely new Python process and recreating all of that data, and thus it is much quicker.

answered Aug 15 '23 at 19:50

Dubious Denise

85
6

Yes you're right, I was overthinking and I made it complicated. Creating a new python interpreter and importing the module in a loop is "*that*" costly. – S.B Aug 15 '23 at 20:01

why spawn method is much slower than fork method in python multiprocessing

1 Answers1