netcdf4-python: memory increasing with numerous calls to slice data from netcdf object

Question

I'm trying to read data slices from a netcdf4 file using netcdf4-python. This is the first time using python and I am running into memory issues. Below is a simplified version of the code. On each iteration of the loop memory jumps by the equivalent of the data slice I read. How can I clean up the memory as I iterate over each variable?

#!/usr/bin/env python
from netCDF4 import Dataset
import os
import sys
import psutil

process = psutil.Process(os.getpid())


def print_memory_usage():
    nr_mbytes = process.get_memory_info()[0] / 1048576.0
    sys.stdout.write("{}\n".format(nr_mbytes))
    sys.stdout.flush()

# open input file and gather variable info

rootgrp_i = Dataset('data.nc','r')
vargrp_i = rootgrp_i.variables
# lets create a dictionary to store the metadata in
subdomain = {}
for suff in range(1000):

    for var in vargrp_i:
        v_i = vargrp_i[var]
        if v_i.ndim == 1:
           a=v_i[:]
        elif v_i.ndim == 2:
           a=v_i[0:20, 0:20]
        elif v_i.ndim == 3:
           a=v_i[0, 0:20, 0:20]
        elif v_i.ndim == 4:
           a=v_i[0, 0:75, 0:20, 0:20]
        else:
           a=v_i[0]
        del a
        print_memory_usage()

rootgrp_i.close()

@user308827 Can you post some code and/or version info where you're seeing the memory leak? I was unable to see a leak with a similar example using Python 2.7.6, netcdf4 1.1.9 and psutil 3.1.1. You can get module version information with the command `pip freeze`. — heenenee, Aug 12 '15 at 19:27
Are you sure that the memory usage goes up for each iteration of the outer loop, and not just each iteration of the inner loop? — Patrick Maupin, Aug 13 '15 at 04:42
Did you try gc - Garbage Collector interface (https://docs.python.org/2/library/gc.html)? — Damián Montenegro, Aug 18 '15 at 17:37
Interestingly, I do not see this issue on Windows. just Mac OS — user308827, Aug 18 '15 at 19:43

score 1 · Answer 1 · edited May 23 '17 at 12:29

1

I think the problem is a misinterpretation of del a meaning.

According to Python Language Reference:

Deletion of a name removes the binding of that name from the local or global namespace, depending on whether the name occurs in a global statement in the same code block.

This means that del a dereference the a variable, but this doesn't imply the memory will be immediately released, this depends on how the garbage collector works. You can ask the garbage collector to collects new garbage using the collect() method:

import gc
gc.collect()

This related post can be useful.

edited May 23 '17 at 12:29

Community

1
1

answered Aug 19 '15 at 15:16

SimoV8

1,382
1
18
32

thanks @SimoV8, still curious why I see the memory issue only on mac and not windowss – user308827 Aug 20 '15 at 17:30
I can't tell for sure but I think is due to a different implementation of the garbage collector. – SimoV8 Aug 21 '15 at 11:42

netcdf4-python: memory increasing with numerous calls to slice data from netcdf object

1 Answers1