Does python os.fork uses the same python interpreter?

Question

I understand that threads in Python use the same instance of Python interpreter. My question is it the same with process created by os.fork? Or does each process created by os.fork has its own interpreter?

score 15 · Accepted Answer · answered May 11 '15 at 00:07

Whenever you fork, the entire Python process is duplicated in memory (including the Python interpreter, your code and any libraries, current stack etc.) to create a second process - one reason why forking a process is much more expensive than creating a thread.

This creates a new copy of the python interpreter.

One advantage of having two python interpreters running is that you now have two GIL's (Global Interpreter Locks), and therefore can have true multi-processing on a multi-core system.

Threads in one process share the same GIL, meaning only one runs at a given moment, giving only the illusion of parallelism.

score 14 · Answer 2 · answered May 11 '15 at 00:21

14

While fork does indeed create a copy of the current Python interpreter rather than running with the same one, it usually isn't what you want, at least not on its own. Among other problems:

There can be problems forking multi-threaded processes on some platforms. And some libraries (most famously Apple's Cocoa/CoreFoundation) may start threads for you in the background, or use thread-local APIs even though you've only got one thread, etc., without your knowledge.
Some libraries assume that every process will be initialized properly, but if you fork after initialization that isn't true. Most infamously, if you let ssl seed its PRNG in the main process, then fork, you now have potentially predictable random numbers, which is a big hole in your security.
Open file descriptors are inherited (as dups) by the children, with details that vary in annoying ways between platforms.
POSIX only requires platforms to implement a very specific set of syscalls between a fork and an exec. If you never call exec, you can only use those syscalls. Which basically means you can't do anything portably.
Anything to do with signals is especially annoying and nonportable after fork.

See POSIX fork or your platform's manpage for details on these issues.

The right answer is almost always to use multiprocessing, or concurrent.futures (which wraps up multiprocessing), or a similar third-party library.

With 3.4+, you can even specify a start method. The fork method basically just calls fork. The forkserver method runs a single "clean" process (no threads, signal handlers, SSL initialization, etc.) and forks off new children from that. The spawn method calls fork then exec, or an equivalent like posix_spawn, to get you a brand-new interpreter instead of a copy. So you can start off with fork, ut then if there are any problems, switch to forkserver or spawn and nothing else in your code has to change. Which is pretty nice.

answered May 11 '15 at 00:21

abarnert

354,177
51
601
671

1

Nice response to some of the many caveats of "blindly ``fork()``ing" :) – James Mills May 11 '15 at 00:43
@JamesMills: Now that you put "blind" and "fork" together, I've got the Angry Samoans' [Lights Out](http://www.plyrics.com/lyrics/angrysamoans/lightsout.html) stuck in my head, and it's not on my iTunes Match. Thanks a lot. :P – abarnert May 11 '15 at 00:45
Hahaha sorry! :) Puns are funny even if it's having fun at one's expense :) – James Mills May 11 '15 at 00:48
So ... is multiprocess safe with locks? The docs don't seem clear. I assume is calls a fork somewhere in its implementation -- which, if its not followed by an exec, is unsafe. – user48956 Apr 16 '18 at 20:06
@user48956 I’m not sure what you’re asking. This answer explains how multiprocessing works under the covers (e.g., it may be passing a message to a fork server, or calling a spawn/CreateProcess function, not forking), and the linked docs explain it in more detail. Meanwhile, of course multiprocessing is safe with the locks and other sync objects from the multiprocessing module, but isn’t generally safe with threading locks. (It should just give you errors for even trying to share threading locks, but with the fork start method, you can sometimes get around the checks and have useless locks). – abarnert Apr 16 '18 at 20:19
Fork then exec seems safe with thread locak --- though I've also heard that its not. I think the documentation is silent on what happens to a threading.Lock with multiprocessing -- it only mentions multiprocessing.Process.Lock. – user48956 Apr 16 '18 at 20:27
@user48956 Are you asking about thread locals (which should be safe if they work at all, because the whole point of thread locals is that they're _not_ shared), or about thread locks? Threading locks will generally not be safe, because POSIX specifies that pthread sync objects are not usable across processes (a platform could make them usable anyway, but there's no good reason to). I don't think the docs mention this because it should be pretty obvious from the fact that `multiprocessing.Lock` exists that you're supposed to use it rather than `threading.Lock`. – abarnert Apr 16 '18 at 20:43
@user48956 Also, as for what happens after you call `os.fork`, the Python docs explicitly defer that to the platform rather than trying to define anything. There are few safe ways to use `os.fork` for anything non-trivial, except for immediately calling `exec` and starting over with a brand new interpreter, so there's no reason to document all of the huge number of things you can't do. – abarnert Apr 16 '18 at 20:45

James Mills · Answer 3 · 2015-05-11T00:54:46.093

os.fork() is equivalent to the fork() syscall in many UNIC(es). So yes your sub-process(es) will be separate from the parent and have a different interpreter (as such).

man fork:

FORK(2)

NAME fork - create a child process

SYNOPSIS #include
   pid_t fork(void);
DESCRIPTION fork() creates a new process by duplicating the calling process. The new process, referred to as the child, is an exact duplicate of the calling process, referred to as the parent, except for the following points:

pydoc os.fork():

os.fork() Fork a child process. Return 0 in the child and the child’s process id in the parent. If an error occurs OSError is raised.

Note that some platforms including FreeBSD <= 6.3, Cygwin and OS/2 EMX have known issues when using fork() from a thread.

See also: Martin Konecny's response as to the why's and advantages of "forking" :)

For brevity; other approaches to concurrency which don't involve a separate process and therefore a separate Python interpreter include:

Green or Lightweight threads; ala greenlet
Coroutines ala Python generators and the new Python 3+ yield from
Async I/O ala asyncio, Twisted, circuits, etc.

One thing to note: `asyncio` actually uses yield-from coroutines (although in 3.5, it may use the new [`async`/`await` coroutines](https://www.python.org/dev/peps/pep-0492/) instead). It might also be worth mentioning more explicit callback-driven/future-driven designs like Twisted, and just directly looping over a `selector` or `select`. Or the equivalents with GUI-style event loops. But I already gave you a +1, so you may not get any extra votes for adding all that. :) — abarnert, May 11 '15 at 00:48
@abarnert There ya go :) It's nice to include the many options I agree :) — James Mills, May 11 '15 at 00:55

Does python os.fork uses the same python interpreter?

3 Answers3

Linked