Python 2.7 memory leak when using scipy to fit (minimze) a function

Question

I want to analyse some 80 measurements by fitting a model to them. This fitting is done by using scipy.minimze to minimize Chi_squared. The problem is that my RAM usage keeps growing steadily until my computer crashes. The only thing that should be saved are the fit parameters, so maybe 5 floats per hour (fitting takes quite a while). However, my memory grows by about an MB every second.

So far I've tried:

Playing with the Garbage collector to collect every time Chi_squared calls my model, didn't help.
Looking at all variables using global() and then using pympler.asizeof to find the total amount of space my variables take up, this first increases but then stays constant.
I've also looked at the memory_profiler but didn't find anything relevant.

I assume that my memory leak must occur somewhere in the model function but I can't figure out where or how to stop this from happening. This belief comes from the observation that my memory usage increases continuously and a single model call can take a minute.

On request I added a MCVE which should reproduce the problem:

import numpy as np
import scipy
import scipy.optimize as op
import scipy.stats
import scipy.integrate



def fit_model(model_pmt, x_list, y_list, PMT_parra, PMT_bounds=None, tolerance=10**-1, PMT_start_gues=None):
    result = op.minimize(chi_squared, PMT_start_gues, args=(x_list, y_list, model_pmt, PMT_parra[0], PMT_parra[1], PMT_parra[2]),
                     bounds=PMT_bounds, method='SLSQP', options={"ftol": tolerance})
    print result



def chi_squared(fit_parm, x, y_val, model, *non_fit_parm):
    parm = np.concatenate((fit_parm, non_fit_parm))
    y_mod = model(x, *parm)
    X2 = sum(pow(y_val - y_mod, 2))
    return X2



def basic_model(cb_list, max_intesity, sigma_e, noise, N, centre1, centre2, sigma_eb, min_dist=10**-5):
        """
        plateau function consisting of two gaussian CDF functions.
        """
        def get_distance(x, r):
            dist = abs(x - r)
            if dist < min_dist:
                dist = min_dist
            return dist

        def amount_of_material(x):
            A = scipy.stats.norm.cdf((x - centre1) / sigma_e)
            B = (1 - scipy.stats.norm.cdf((x - centre2) / sigma_e))
            cube =  A * B
            return cube

        def amount_of_field_INTEGRAL(x, cb):
        """Integral that is part of my sum"""
            result = scipy.integrate.quad(lambda r: scipy.stats.norm.pdf((r - cb) / sigma_b) / pow(get_distance(x, r), N),
                                          start, end, epsabs=10 ** -1)[0]
            return result



        # Set some constants, not important
        sigma_b = (sigma_eb**2-sigma_e**2)**0.5
        start, end = centre1 - 3 * sigma_e, centre2 + 3 * sigma_e
        integration_range = np.linspace(start, end, int(end - start) / 20)
        intensity_list = []

        # Doing a riemann sum, this is what takes the most time.
        for i, cb_point in enumerate(cb_list):
            intensity = sum([amount_of_material(x) * amount_of_field_INTEGRAL(x, cb_point) for x in integration_range])
            intensity *= (integration_range[1] - integration_range[0])
            intensity_list.append(intensity)


        model_values = np.array(intensity_list) / max(intensity_list)* max_intesity + noise
        return model_values


def get_dummy_data():
"""Can be ignored, produces something resembling my data with noise"""
    # X is just a range
    x_list = np.linspace(0, 300, 300)

    # Y is some sort of step function with noise
    A = scipy.stats.norm.cdf((x_list - 100) / 15.8)
    B = (1 - scipy.stats.norm.cdf((x_list - 200) / 15.8))
    y_list = A * B * .8 + .1 + np.random.normal(0, 0.05, 300)

    return x_list, y_list


if __name__=="__main__":
    # Set some variables
    start_pmt = [0.7, 8, 0.15, 0.6]
    pmt_bounds = [(.5, 1.3), (4, 15), (0.05, 0.3), (0.5, 3)]
    pmt_par = [110, 160, 15]
    x_list, y_list = get_dummy_data()

    fit_model(basic_model, x_list, y_list,  pmt_par, PMT_start_gues=start_pmt, PMT_bounds=pmt_bounds, tolerance=0.1)

Thanks for trying to help!

This is really not the typical [MCVE](https://stackoverflow.com/help/mcve) as it looks incomplete and we can't run it. That makes help very hard! Sidenote: why using SLSQP? I would not, as you got no constraints (try L-BFGS-B; although not necessarily the problem in your code here)! — sascha, Oct 23 '17 at 14:44
Thank you for your comment. I use SLSQP because I sometimes do have constraints and want to be able to use this function in both cases. Making a MCVE is something I hoped to avoid because a lot of stuff happens in my code but I'll get on it and update my question when I managed to do so! — joris267, Oct 23 '17 at 15:24
Possible duplicate of [Python 2.7 memory leak with scipy.minimze](https://stackoverflow.com/questions/46904999/python-2-7-memory-leak-with-scipy-minimze) — MB-F, Oct 24 '17 at 08:57
Please do not post questions twice. I marked this one as duplicate although it was asked first because the other one was answered. — MB-F, Oct 24 '17 at 08:59

Python 2.7 memory leak when using scipy to fit (minimze) a function

0 Answers0