I am getting "high water mark" memory leak behavior when I run:
import gc
temp = [[0.1] for _ in xrange(10 ** 7)]
del temp
gc.collect()
The resident memory starts at ~7 MB climbs to ~1000 MB and then settles at ~312MB. Subsequent runs do not increase the memory leak above 312 MB. Why does this happen and are there any known work arounds?
Various observations:
- It happens on Ubuntu 14.04 but not on OSX
- It does not happen in python3
[[] for _ in xrange(10 ** 7)]
does not leak[0.1 for _ in xrange(10 ** 7)]
does not leak[(0.1,) for _ in xrange(10 ** 7)]
does not leak[0.1 for _ in xrange(10 ** 7)]
does not leak{random.random(): {0.1: 0.1} for _ in xrange(10 ** 7)}
does leak- Clearing the individual lists one at a time doesn't help
- Running in the python shell vs in a file doesn't seem to have an impact
- I have reproduced the behavior in python versions: 2.7.15, 2.7.14, 2.7.11, and 2.7.5
My first intuition was that it's caused by arenas not getting cleaned up. But that didn't make sense because I would expect the same behavior with [0.1 for _ in xrange(10 ** 7)]
but that doesn't happen.
Why does nesting the list/dictionary result in this high water mark behavior?
I am primarily measuring the resident memory using htop