I'm interested to find out the increase in the total size of python's heap when a large object is loaded. heapy seems to be what I need, but I don't understand the results.
I have a 350 MB pickle file with a pandas DataFrame
in it, which contains about 2.5 million entries. When I load the file and inspect the heap with heapy afterwards, it reports that only roughly 8 MB of objects have been added to the heap.
import guppy
h = guppy.hpy()
h.setrelheap()
df = pickle.load(open('test-df.pickle'))
h.heap()
This gives the following output:
Partition of a set of 95278 objects. Total size = 8694448 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 44700 47 4445944 51 4445944 51 str
1 25595 27 1056560 12 5502504 63 tuple
2 6935 7 499320 6 6001824 69 types.CodeType
...
What confuses me is the Total size
of 8694448 bytes
. That's just 8 MB.
Why doesn't Total size
reflect the size of the whole DataFrame
df
?
(Using python 2.7.3, heapy 0.1.10, Linux 3.2.0-48-generic-pae (Ubuntu), i686 )