Will os.fork() use copy on write or do a full copy of the parent-process in Python?

Question

I would like to load a rather large data structure into a process and then fork in the hope to reduce total memory consumption. Will os.fork work that way or copy all of the parent process in Linux (RHEL)?

Why do you believe a copy-on-write fork would reduce memory consumption? — Basile Starynkevitch, Feb 18 '13 at 17:23
Because the child process would share the fat and static datastructure that is rather large. Or atleast I presumed. — AME, Feb 18 '13 at 17:24
i think it will do COW. but that's just what i think. Interesting though, i wonder how it is implemented on windows. — Hayri Uğur Koltuk, Feb 18 '13 at 17:26
@AliVeli It's not implemented on Windows at all ("Availability: Unix"). — , Feb 18 '13 at 17:41

score 25 · Accepted Answer · 2013-02-18T17:42:23.423

Even if COW is employed, CPython uses reference counting and stores the reference count in each object's header. So unless you don't do anything with that data, you'll quickly have spurious writes to the memory in question, which will force the system to copy the data. Pass it to a function? That's another reference, an INCREF, a write to the COW'd memory. Store it in a variable or object attribute? Same. Even just look up a method on it? Ditto. Some builtin data structures allocate the bulk of their data separately from the object (e.g. most collections) for various reasons. If these end up on a different page -- or whatever granularity COW works on -- you may get lucky with those. However, an object referenced from such a collection is not exempt -- using it manipulates its refcount just the same.

In addition, a bit of data will be shared because there are no writes to it by design (e.g., the native CPython code), and some objects your fork'd process does not touch may be shared (I'm honestly not sure; I think the cycle GC does not write to the object). But Python objects used by Python code is virtually guaranteed to get written to. Similar reasoning applies to PyPy, Jython, IronPython, etc. (only that they fiddle with bits in the object header instead of doing reference counting) though I can't vouch for all possible configurations.

May I ask for a reference for those assumptions? – Michael Dorner Oct 25 '22 at 17:35 — Michael Dorner, Oct 25 '22 at 17:35

score 3 · Answer 2 · edited Feb 18 '13 at 17:35

3

If you are on a *nix system, then os.fork() will use the system's fork() call which, at least in the case of Linux, is copy-on-write:

http://linux.die.net/man/2/fork

See "notes" section

edited Feb 18 '13 at 17:35

jleahy

16,149
6
47
66

answered Feb 18 '13 at 17:23

Forhad Ahmed

1,761
13
18

2

The POSIX standard doesn't mention copy-on-write, thus a UNIX system is free to *not* use it. Also, there is no `fork` in non-*unix systems, so your initial statement is a tautology. – Bakuriu Feb 18 '13 at 17:25

Will os.fork() use copy on write or do a full copy of the parent-process in Python?

2 Answers2

Linked