I was tracking down an out of memory bug, and was horrified to find that python's multiprocessing appears to copy large arrays, even if I have no intention of using them.
Why is python (on Linux) doing this, I thought copy-on-write would protect me from any extra copying? I imagine that whenever I reference the object some kind of trap is invoked and only then is the copy made.
Is the correct way to solve this problem for an arbitrary data type, like a 30 gigabyte custom dictionary to use a Monitor
? Is there some way to build Python so that it doesn't have this nonsense?
import numpy as np
import psutil
from multiprocessing import Process
mem=psutil.virtual_memory()
large_amount=int(0.75*mem.available)
def florp():
print("florp")
def bigdata():
return np.ones(large_amount,dtype=np.int8)
if __name__=='__main__':
foo=bigdata()#Allocated 0.75 of the ram, no problems
p=Process(target=florp)
p.start()#Out of memory because bigdata is copied?
print("Wow")
p.join()
Running:
[ebuild R ] dev-lang/python-3.4.1:3.4::gentoo USE="gdbm ipv6 ncurses readline ssl threads xml -build -examples -hardened -sqlite -tk -wininst" 0 KiB