Fit an arbitrary number of parameters when calling curve_fit

Question

Closest I found to this question was here: Fitting only one parameter of a function with many parameters in python. I have a multi-parameter function that I want to be able to call with a different subset of parameters being optimised in different parts of the code (useful because for some datasets, I may be able to fix some parameters based on ancillary data). Simplified demonstration of the problem below.

from scipy.optimize import curve_fit
import numpy as np

def wrapper_func(**kwargs):
    a = kwargs['a'] if 'a' in kwargs else None
    b = kwargs['b'] if 'b' in kwargs else None
    c = kwargs['c'] if 'c' in kwargs else None
return lambda x, a, c: func(x, a, b, c)

def func(x, a, b, c):
    return a * x**2 + b * x + c

# Set parameters    
a = 0.3
b = 5
c = 17 

# Make some fake data
x_vals = np.arange(100)
y_vals = a * x_vals**2 + b * x_vals + c
noise = np.random.randn(100) * 20

# Get fit
popt, pcov = curve_fit(lambda x, a_, c_: func(x, a_, b, c_), 
                       x_vals, y_vals + noise)

# Get fit using separate function
alt_popt, alt_cov = curve_fit(wrapper_func(b=5), x_vals, y_vals + noise)

So this works, but I want to be able to pass any combination of parameters to be fixed. So here parameters a and c are optimised, and b is fixed, but if I want to fix a and optimise b and c (or any other combination), is there a way to do this neatly? I made a start with wrapper_func() above, but the same problem arises: there seems to be no way to vary which parameters are optimised, except by writing multiple lambdas (conditional on what fixed parameter values are passed). This gets ugly quickly because the equations I am working with have 4-6 parameters. I can make a version work using eval, but gather this is not recommended. As it stands I have been groping around trying to use *args with lambda, but haven't managed to get it to work. Any tips greatly appreciated!

M Newville · Accepted Answer · 2017-10-05T02:28:08.270

lmfit (https://lmfit.github.io/lmfit-py/) does exactly this. Instead of creating an array of floating point values for the parameters in the fit, one creates a Parameters object -- an ordered dictionary of Parameter objects that are used to parametrize the model for the data. Each Parameter can be fixed or varied in the fit, can have max/min bounds, or can be defined as a simple mathematical expression in terms of other Parameters in the fit.

That is, with lmfit (and its Model class that is especially useful for curve-fitting), one creates Parameters and can then decide which will be optimized and which will be held fixed.

As an example, here is a variation on the problem you pose:

import numpy as np
from lmfit import Model
import matplotlib.pylab as plt

# starting parameters
a, b, c = 0.3, 5, 17
x_vals = np.arange(100)
noise = np.random.normal(size=100, scale=0.25)
y_vals = a * x_vals**2 + b * x_vals + c + noise

def func(x, a, b, c):
    return a * x**2 + b * x + c

# create a Model from this function
model = Model(func)

# create parameters with initial values. Model will know to 
# turn function args `a`, `b`, and `c` into Parameters:
params = model.make_params(a=0.25, b=4, c=10)

# you can alter each parameter, for example, fix b or put bounds on a
params['b'].vary = False
params['b'].value = 5.3
params['a'].min = -1
params['a'].max =  1

# run fit
result = model.fit(y_vals, params, x=x_vals)

# print and plot results
print(result.fit_report())
result.plot(datafmt='--')
plt.show()

will print out:

[[Model]]
    Model(func)
[[Fit Statistics]]
    # function evals   = 12
    # data points      = 100
    # variables        = 2
    chi-square         = 475.843
    reduced chi-square = 4.856
    Akaike info crit   = 159.992
    Bayesian info crit = 165.202
[[Variables]]
    a:   0.29716481 +/- 7.46e-05 (0.03%) (init= 0.25)
    b:   5.3 (fixed)
    c:   11.4708897 +/- 0.329508 (2.87%) (init= 10)
[[Correlations]] (unreported correlations are <  0.100)
    C(a, c)                      = -0.744

(You will find that b and c are highly and negatively correlated) and show a plot like

Furthermore, the fit results including the parameters are held in result, so if you want to change what parameters are fixed, you can simply change the starting values (which have not been updated by the fit):

params['b'].vary = True
params['a'].value = 0.285
params['a'].vary = False

newresult = model.fit(y_vals, params, x=x_vals)

and then compare/contrast the two results.

This looks good. I should have mentioned in the question that I was looking into lmfit, but was wondering whether there was a reasonably straightforward way to do it without installing additional modules. If not, I guess that is why lmfit was developed! — easymc, Oct 05 '17 at 02:27
lmfit is pure python, available on pypi and conda channels -- it should not be a burden as a dependency. Of course, you can try to roll your own solution, but you could also write `curve_fit` or `numpy` yourself ;). — M Newville, Oct 06 '17 at 20:55
Okay this is very useful. Except: this is probably a new question in itself but I am unable to install using conda because of a conflict between lmfit and bottleneck (anaconda is fully up to date) :( — easymc, Oct 09 '17 at 05:00
that does seem like a different question. Not sure what the problem is, but lmfit is available on a few conda channels and PyPI, and installing from source is easy since the package is pure python. — M Newville, Oct 09 '17 at 21:28
Thanks very much for your help. I tried a different conda channel and it installed without issue. — easymc, Oct 09 '17 at 23:25

score 0 · Answer 2 · answered Oct 04 '17 at 09:43

Here my solution. I am not sure how to do it with curve_fit, but it works with leastsq. It has a wrapper function that takes the free and fixed parameters as well as a list of the free parameter positions. As leastsq calls the function with the free parameters first, hence, the wrapper has to rearrange the order.

from matplotlib import pyplot as plt
import numpy as np
from scipy.optimize import leastsq

def func(x,a,b,c,d,e):
    return a+b*x+c*x**2+d*x**3+e*x**4

#takes x, the 5 parameters and a list
# the first n parameters are free
# the list of length n gives there position, e.g. 2  parameters, 1st and 3rd order ->[1,3]
# the remaining parameters are in order, i.e. in this example it would be f(x,b,d,a,c,e)
def expand_parameters(*args):
    callArgs=args[1:6]
    freeList=args[-1]
    fixedList=range(5)
    for item in freeList:
        fixedList.remove(item)
    callList=[0,0,0,0,0]
    for val,pos in zip(callArgs, freeList+fixedList):
        callList[pos]=val
    return func(args[0],*callList)

def residuals(parameters,dataPoint,fixedParameterValues=None,freeParametersPosition=None):
    if fixedParameterValues is None:
        a,b,c,d,e = parameters
        dist = [y -func(x,a,b,c,d,e) for x,y in dataPoint] 
    else:
        assert len(fixedParameterValues)==5-len(freeParametersPosition)
        assert len(fixedParameterValues)>0
        assert len(fixedParameterValues)<5 # doesn't make sense to fix all
        extraIn=list(parameters)+list(fixedParameterValues)+[freeParametersPosition]
        dist = [y -expand_parameters(x,*extraIn) for x,y in dataPoint]
    return dist


if __name__=="__main__":
    xList=np.linspace(-1,3,15)
    fList=np.fromiter( (func(s,1.1,-.9,-.7,.5,.1) for s in xList), np.float)

    fig=plt.figure()
    ax=fig.add_subplot(1,1,1)

    dataTupel=zip(xList,fList)

    ###some test
    print residuals([1.1,-.9,-.7,.5,.1],dataTupel)
    print residuals([1.1,-.9,-.7,.5],dataTupel,fixedParameterValues=[.1],freeParametersPosition=[0,1,2,3])

    #exact fit
    bestFitValuesAll, ier = leastsq(residuals, [1,1,1,1,1],args=(dataTupel))
    print bestFitValuesAll

    ###Only a constant
    guess=[1]
    bestFitValuesConstOnly, ier = leastsq(residuals, guess,args=(dataTupel,[0,0,0,0],[0]))
    print bestFitValuesConstOnly
    fConstList=np.fromiter(( func(x,*np.append(bestFitValuesConstOnly,[0,0,0,0])) for x in xList),np.float)

    ###Only 2nd and 4th
    guess=[1,1]
    bestFitValues_1_3, ier = leastsq(residuals, guess,args=(dataTupel,[0,0,0],[2,4]))
    print bestFitValues_1_3
    f_1_3_List=np.fromiter(( expand_parameters(x, *(list(bestFitValues_1_3)+[0,0,0]+[[2,4]] ) )  for x in xList),np.float)


    ###Only 2nd and 4th with closer values
    guess=[1,1]
    bestFitValues_1_3_closer, ier = leastsq(residuals, guess,args=(dataTupel,[1.2,-.8,0],[2,4]))
    print bestFitValues_1_3_closer
    f_1_3_closer_List=np.fromiter(( expand_parameters(x, *(list(bestFitValues_1_3_closer)+[1.2,-.8,0]+[[2,4]] ) )  for x in xList),np.float)


    ax.plot(xList,fList,linestyle='',marker='o',label='orig')
    ax.plot(xList,fConstList,linestyle='',marker='o',label='0')
    ax.plot(xList,f_1_3_List,linestyle='',marker='o',label='1,3')
    ax.plot(xList,f_1_3_closer_List,linestyle='',marker='o',label='1,3 c')

    ax.legend(loc=0)

    plt.show()

Providing:

>>[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
>>[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
>>[ 1.1 -0.9 -0.7  0.5  0.1]
>>[ 2.64880466]
>>[-0.14065838  0.18305123]
>>[-0.31708629  0.2227272 ]

Fit an arbitrary number of parameters when calling curve_fit

2 Answers2