Understanding Python's memory allocation

Question

I'm trying to track down a suspected memory leak in a Python application which uses numpy and pandas as two of the main libraries. I can see that the application uses more and more memory over time (as DataFrames are processed).

Memory consumption per processing iteration (in MB):

I'd like to understand what the memory is used for. I therefore used tracemalloc, but I'm struggling to consolidate the output of tracemalloc. The program calls tracemalloc.start() and snapshot = tracemalloc.take_snapshot() after executing the suspicious code.

The stats is printed with:

gc.collect()
snapshot = tracemalloc.take_snapshot()
for i, stat in enumerate(snapshot.statistics('filename'), 1):
    print('top_current', i, str(stat))

When I add up the size portion of the many (200-300) lines of e.g. top_current 11 <frozen importlib._bootstrap>:0: size=71.7 KiB, count=769, average=95 B (or similar), I get a few Megabytes as result. However the application is consuming memory in the Gigabyte range. Am I missing anything? How can I accurately see which objects reside in memory (and pose a potential memory leak)?

I also ran a set with PYTHONMALLOCSTATS=1. That test revealed an increased usage of Python memory arenas over the run of the program.

Small block threshold = 512, in 32 size classes.

class   size   num pools   blocks in use  avail blocks
-----   ----   ---------   -------------  ------------
    0     16          32            7879           217
    1     32         419           52669           125
    2     48        1366          114740             4
    3     64        4282          269734            32
    4     80        2641          132005            45
    5     96         904           37955            13
    6    112         611           21971            25
    7    128         518           16044            14
    8    144        1728           48364            20
    9    160         238            5931            19
   10    176        3296           75808             0
   11    192         129            2691            18
   12    208         125            2365            10
   13    224         414            7434            18
   14    240          95            1512             8
   15    256          88            1319             1
   16    272          78            1085             7
   17    288          69             958             8
   18    304         651            8458             5
   19    320          57             676             8
   20    336         118            1415             1
   21    352          58             631             7
   22    368          44             476             8
   23    384          44             440             0
   24    400          53             523             7
   25    416          90             810             0
   26    432         115            1026             9
   27    448          97             864             9
   28    464          92             732             4
   29    480          75             593             7
   30    496          88             703             1
   31    512         137             953             6

# arenas allocated total           =                  614
# arenas reclaimed                 =                  321
# arenas highwater mark            =                  293
# arenas allocated current         =                  293
293 arenas * 262144 bytes/arena    =           76,808,192

# bytes in allocated blocks        =           75,168,000
# bytes in available blocks        =               70,352
0 unused pools * 4096 bytes        =                    0
# bytes lost to pool headers       =              900,096
# bytes lost to quantization       =              669,744
# bytes lost to arena alignment    =                    0
Total                              =           76,808,192

The number of arenas (293 in the above example) seems to be increasing. That's a problem in itself, but it accounts for ~76MB only. The application however (measured by proc = psutil.Process(os.getpid()); proc.memory_info().rss and htop which are equivalent) shows memory allocation in the range of Gigabytes (~2GB when the above stats were printed).

What's allocating all that memory, if it's not Python's arenas/pools/blocks? What else could I try to measure to consolidate the memory consumption shown in htop?

Understanding Python's memory allocation

0 Answers0