I would like to load a rather large data structure into a process and then fork in the hope to reduce total memory consumption. Will os.fork
work that way or copy all of the parent process in Linux (RHEL)?

- 72,968
- 25
- 171
- 229

- 2,499
- 5
- 29
- 45
-
Why do you believe a copy-on-write fork would reduce memory consumption? – Basile Starynkevitch Feb 18 '13 at 17:23
-
4Because the child process would share the fat and static datastructure that is rather large. Or atleast I presumed. – AME Feb 18 '13 at 17:24
-
i think it will do COW. but that's just what i think. Interesting though, i wonder how it is implemented on windows. – Hayri Uğur Koltuk Feb 18 '13 at 17:26
-
1@AliVeli It's not implemented on Windows at all ("Availability: Unix"). – Feb 18 '13 at 17:41
2 Answers
Even if COW is employed, CPython uses reference counting and stores the reference count in each object's header. So unless you don't do anything with that data, you'll quickly have spurious writes to the memory in question, which will force the system to copy the data. Pass it to a function? That's another reference, an INCREF
, a write to the COW'd memory. Store it in a variable or object attribute? Same. Even just look up a method on it? Ditto.
Some builtin data structures allocate the bulk of their data separately from the object (e.g. most collections) for various reasons. If these end up on a different page -- or whatever granularity COW works on -- you may get lucky with those. However, an object referenced from such a collection is not exempt -- using it manipulates its refcount just the same.
In addition, a bit of data will be shared because there are no writes to it by design (e.g., the native CPython code), and some objects your fork
'd process does not touch may be shared (I'm honestly not sure; I think the cycle GC does not write to the object). But Python objects used by Python code is virtually guaranteed to get written to. Similar reasoning applies to PyPy, Jython, IronPython, etc. (only that they fiddle with bits in the object header instead of doing reference counting) though I can't vouch for all possible configurations.
If you are on a *nix system, then os.fork()
will use the system's fork()
call which, at least in the case of Linux, is copy-on-write:
http://linux.die.net/man/2/fork
See "notes" section

- 16,149
- 6
- 47
- 66

- 1,761
- 13
- 18
-
2The POSIX standard doesn't mention copy-on-write, thus a UNIX system is free to *not* use it. Also, there is no `fork` in non-*unix systems, so your initial statement is a tautology. – Bakuriu Feb 18 '13 at 17:25