0

This topic describes how to fit multiple data-sets using lmfit: Python and lmfit: How to fit multiple datasets with shared parameters?

However it uses a fitting/objective function written by the user.

I was wondering if it's possible to fit multiple data-sets using lmfit without writing an objective function and using model.fit() method of the model class.

As an example: Lets say we have multiple data sets of (x,y) coordinates that we want to fit using the same model function in order to find the set of parameters that on average fit all the data best.

import numpy as np 
from lmfit import Model, Parameters
from lmfit.models import GaussianModel

def gauss(x, amp, cen, sigma):
    return amp*np.exp(-(x-cen)**2/(2.*sigma**2))

x1= np.arange(0.,100.,0.1)
x2= np.arange(0.,100.,0.09)
y1= gauss(x1, 1.,50.,5.)+ np.random.normal(size=len(x1), scale=0.1)
y2= gauss(x2, 0.8,48.4.,4.5)+ np.random.normal(size=len(x2), scale=0.1)

mod= GaussianModel()
params= mod.make_params()

mod.fit([y1,y2], params, x= [x1, x2])

I guess if this is possible the data has to be passed to mod.fit in the right type. The documentation only says that mod.fit takes an array-like data input.

I tried to give it lists and arrays. If I pass the different data sets as a list I get a ValueError: setting an array element with a sequence

If I pass an array I get an AttributeError: 'numpy.ndarray' has no atribute 'exp'

So am I just trying to do something that isn't possible or am I doing something wrong?

Lipo
  • 1
  • 1

2 Answers2

0

Well, I think the answer is "sort of". The lmfit.Model class is meant to represent a model for an array of data. So, if you can map your multiple datasets into a numpy ndarray (say, with np.concatenate), you can probably write a Model function to represent this by building sub-models for the different datasets and concatenating them in the same way.

I don't think you could do that with any of the built-in models. I also think that once you start down the road of writing complex model functions, it isn't a very big jump to writing objective functions. That is, what would be

def model_function(x, a, b, c):
   ### do some calculation with x, a, b, c values
   result = a + x*b + x*x*c
   return result

might become

def objective_function(params, x, data):
     vals = params.valuesdict()
     return data - model_function(x, vals['a'], vals['b'], vals['c'])

If that do_calc() is doing anything complex, the additional burden of unpacking the parameters and subtracting the data is pretty small. And, especially if some parameters would be used for multiple datasets and some only for particular datasets, you'll have to manage that in either the model function or the objective function. In the example you link to, my answer included a loop over datasets, picking out parameters by name for each dataset. You'll probably want to do something like that. You could probably do that in a model function by thinking of it as modeling the concatenated datasets, but I'm not sure you'd really gain a lot by doing that.

M Newville
  • 7,486
  • 2
  • 16
  • 29
0

I found the problem. Actually model.fit() will handle arrays of multiple data sets just fine and perform a proper fit. The correct call of model.fit() with multiple data sets would be:

import numpy as np 
from lmfit import Model, Parameters
from lmfit.models import GaussianModel
import matplotlib.pyplot as plt

def gauss(x, amp, cen, sigma):
   "basic gaussian"
    return amp*np.exp(-(x-cen)**2/(2.*sigma**2))

x1= np.arange(0.,100.,0.1)
x2= np.arange(0.,100.,0.1)
y1= gauss(x1, 1.,50.,5.)+ np.random.normal(size=len(x1), scale=0.01)
y2= gauss(x2, 0.8,48.4,4.5)+ np.random.normal(size=len(x2), scale=0.01)

mod= GaussianModel()
params= mod.make_params()

params['amplitude'].set(1.,min=0.01,max=100.)
params['center'].set(1.,min=0.01,max=100.)
params['sigma'].set(1.,min=0.01,max=100.)

result= mod.fit(np.array([y1,y2]), params,method='basinhopping',
x=np.array([x1,x2]))

print(result.fit_report(min_correl=0.5))

fig, ax = plt.subplots()

plt.plot(x1,y1, lw=2, color='red')
plt.plot(x2,y2, lw=2, color='orange')
plt.plot(x1,result.eval(x=x1), lw=2, color='black')

plt.show()

The problem in the original code actually lies in the fact that my data sets don't have the same length. However I'm not sure at all how to handle this in the most elegant way?

Lipo
  • 1
  • 1
  • hmm, you must mean something different by "multiple data set fit" than I do. I would expect that to mean that you want to fit each of the data sets (x1, y1) and (x2, y2) to Gaussians, possibly sharing parameters between the two curves. The fit you have defined here will fit 1 Gaussian to the concatenated data set (`np.concatenate((x1, x2))`, `np.concatenate((y1, y2))`). – M Newville Jul 17 '18 at 20:12
  • Are you sure? I compared the way described above with the use of an objective function an they give me exactly the same results? Just to make it clear: I want to fit the parameters of the function so it describes all the data sets as good as possible. – Lipo Jul 19 '18 at 07:58
  • Yes. Because your `x1=x2`, and the way that arrays are flattened, what you may be getting is the fit to the average of y1 and y2. That is not what I would typically call a fit to two data sets. Put another way: your result has one value for 'center', 'amplitude', and 'sigma', right? If you had made the centers of your two "data sets" not 48.4 and 50 (within sigma), but more like 30 and 70, I expect you will end up with a good fit to *one* of the peaks. The arrays used for the data and returned by the Model *will be* flattened to 1D arrays. – M Newville Jul 19 '18 at 15:30