From this question and its answers, I think I understand why this python code:
big_list = [
{j: 0 for j in range(200000)}
for i in range(60)
]
def worker():
for dic in big_list:
for key in dic:
pass
print "."
time.sleep(0.2)
w = multiprocessing.Process(target=worker)
w.start()
time.sleep(3600)
keeps using more and more memory during its execution: it's because the child process updates reference count to a shared-memory object in the loop, triggering the "copy-on-write" mecanism (I can watch the free memory diminushing via cat /proc/meminfo | grep MemFree
).
What I don't understand, however, is why the same thing happens if the iteration takes place in the parent rather than in the child:
def worker():
time.sleep(3600)
w = multiprocessing.Process(target=worker)
w.start()
for dic in big_list:
for key in dic:
pass
print "."
time.sleep(0.2)
The child don't even needs to know the existence of big_list
.
In this small example I can solve the problem by putting del big_list
in the child function, but sometimes variables references are not accessible like this one, so things get complicated.
Why is this mecanism happening and how can I avoid it properly?