As you alluded to fork()
is a bit of a mad syscall that has kind of stuck around for historical reasons. There's a great article about its flaws here, and also this post goes into some details and potential workarounds.
Although on Linux fork()
is optimised to use copy-on-write for the memory, it's still not "free" because:
- It still has to do some memory-related admin (new page tables, etc.)
- If you're using RAII (e.g. in C++ or possibly Rust) then all the objects that are copied will be cleaned up twice. That might even lead to logic errors (e.g. deleting temporary files twice).
- It's likely that the parent process will keep running, probably modifying lots of its memory, and then it will have to be copied.
The alternatives appear to be:
vfork()
clone()
posix_spawn()
vfork()
was created for the common use case of doing fork()
and then execve()
to run a program. execve()
replaces all of the memory of the current process with a new set, so there's no point copying the parent process's memory if your just about to obliterate it.
So vfork()
doesn't do that. Instead it runs in the same memory space as the parent process and pauses it until it gets to execve()
. The Linux man page for vfork()
says that doing just about anything except vfork()
then execve()
is undefined behaviour.
posix_spawn()
is basically a nice wrapper around vfork()
and then execve()
.
clone()
is similar to fork()
but allows you to exactly specify what is copied (file descriptors, memory, etc.). It has a load of options, including one (CLONE_VM
) which lets the child process run in the same address space as the parent, which is pretty wild! I guess that is the lightest weight way to make a new process because it doesn't involve any copying of memory at all!
But in practice I think in most situations you should either:
- Use threads, or
- Use
posix_spawn()
.
(Note, I am just researching this now; I'm not an expert so I might have got some things wrong.)