1

I want to fit a data set with the shape (161,14), where rows are a energy direcion and cols are the repititions of the same spectrum with varying experimental conditions.

There should be 3 different peaks in the data set, so i set up a composite model of three voigts. The goal is to have shared parameters, such that center and widths of the voigts are the same.

I found this related question Python and lmfit: How to fit multiple datasets with shared parameters?

However here the parmeters are hard wired, so i tried as below.


import h5py
import numpy as np
from lmfit import Parameters, minimize, report_fit
from lmfit.models import VoigtModel, LinearModel
from matplotlib import pyplot as plt
import cProfile

mods = None
c = [530., 531.5, 533.]
c_win = 1
sigma = 0.2
gamma = 0.2
gamma_min = 0.1
gamma_max = 1.


def objective(params, x, data):
    """ calculate total residual for fits to several data sets held
    in a 2-D array, and modeled by Gaussian functions"""
    nx, ndata = data.shape
    resid = 0.0 * data[:]
    # nx = 1
    # make residual per data set
    for i in range(ndata):
        resid[:, i] = data[:, i] - mods[i].eval(params,x=x)
    # resid = data - mods[0].eval(params, x=x)
    # now flatten this to a 1D array, as minimize() needs
    # print(resid.sum())
    return resid.flatten()


def make_param(v, params):
    for i in range(3):
        v[i].set_param_hint('amplitude', value=1e3)
        v[i].set_param_hint('center', value=c[i], min=c[i] - c_win, max=c[i] + c_win)
        v[i].set_param_hint('sigma', vary=False, value=sigma)
        v[i].set_param_hint('gamma', vary=True, expr='', value=gamma, min=gamma_min, max=gamma_max)
        params += v[i].make_params()


f = h5py.File("../../analysis.h5", "a")
raw = f["rawdata"]
proc = f["processed"]
spec_group = raw["Co0001_0042O1s_4600"]
specs = spec_group['sweeps'][()]
x = spec_group['x_b'][()]

specs2 = np.zeros((161, 14))
specs2[:, :] = specs[:, 0, :]

l0 = LinearModel(prefix="l0_")
v0 = VoigtModel(prefix="p0_")
v1 = VoigtModel(prefix="p1_")
v2 = VoigtModel(prefix="p2_")
v = [v0, v1, v2]
params = Parameters()
mod0 = l0 + v0 + v1 + v2
params += l0.make_params(intercept=3000, slope=0)

make_param(v, params)

specs2 = specs2[:, ::4]

mods = [mod0]
for i in range(1, specs2.shape[1]):
    l0 = LinearModel(prefix="l0_%i" % i)
    v0 = VoigtModel(prefix="p0_%i" % i)
    v1 = VoigtModel(prefix="p1_%i" % i)
    v2 = VoigtModel(prefix="p2_%i" % i)
    params += l0.make_params(intercept=3000, slope=0)
    v = [v0, v1, v2]
    make_param(v, params)
    params['p0_%icenter' % i].expr = 'p0_center'
    params['p1_%icenter' % i].expr = 'p1_center'
    params['p2_%icenter' % i].expr = 'p2_center'
    params['p0_%igamma' % i].expr = 'p0_gamma'
    params['p1_%igamma' % i].expr = 'p1_gamma'
    params['p2_%igamma' % i].expr = 'p2_gamma'
    params['p0_%isigma' % i].expr = 'p0_sigma'
    params['p1_%isigma' % i].expr = 'p1_sigma'
    params['p2_%isigma' % i].expr = 'p2_sigma'

    mods += [l0 + v0 + v1 + v2]

cProfile.run('result = minimize(objective, params, args=(x, specs2))')
# result = minimize(objective, params, args=(x, specs2))#,method='ampgo')
report_fit(result)

plt.figure()
plt.plot(x, specs2[:, 0], x, mods[0].eval(result.params, x=x))
plt.plot(x, specs2[:, -1], x, mods[-1].eval(result.params, x=x))
high = np.max(x)
low = np.min(x)
plt.xlim(high, low)
plt.show()

The code runs and fits are satisfactory, however it takes very long.

So i did a cprofile and it seems that most of the time is string parsing. Is this intended or is there a way to reduce this time?

Also i noticed, that 14125 evaluations had to be run for these 4 spectra. Quite a lot, right? Am I making a fundamental error in the way i define parameters or is a different minimization better for this particular problem?

Profiling and fit report: https://pastebin.com/pveD6sRe

First lines of the profiling sorted by total time:
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
10163844/1288568   21.010    0.000   42.870    0.000 asteval.py:279(run)
   226048    9.725    0.000   32.178    0.000 model.py:775(make_funcargs)
 18309888    8.781    0.000   13.574    0.000 model.py:769(_strip_prefix)
   169536    6.870    0.000    6.870    0.000 lineshapes.py:63(voigt)
  1695690    4.731    0.000   54.555    0.000 parameter.py:745(_getval)
user99533
  • 11
  • 3

1 Answers1

0

Well, profiling is often tricky, but you did not include the result of the profiling or the fit report, making it hard to adequately respond too.

14000 function evaluations does seem like a lot. But I do not how realistic your initial values for the Voigt parameters are. It seems a little odd to define three Voigt functions and then constrain all the parameter to be identical. It also looks very very strange to be mixing creating a composite Model and then using lmfit.minimize.

For a simplified (but ought to be related?) case based on a lmfit example (with data from https://github.com/lmfit/lmfit-py/blob/master/examples/test_peak.dat):

#!/usrbin/env python
from numpy import loadtxt
import cProfile
from lmfit.models import  VoigtModel

data = loadtxt('examples/test_peak.dat') # from lmfit/examples
x = data[:, 0]
y = data[:, 1]

mod = VoigtModel()
pars = mod.guess(y, x=x)
pars['gamma'].set(value = 2, vary=True, expr=None)

cProfile.run("out= mod.fit(y, pars, x=x)", sort=1)

print(out.fit_report(min_correl=0.25))

I get 54 function evaluations and profile output of

42228 function calls (37487 primitive calls) in 0.054 seconds
Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        3    0.021    0.007    0.021    0.007 {built-in method numpy.dot}
33225/150    0.006    0.000    0.013    0.000 asteval.py:279(run)
       59    0.004    0.000    0.004    0.000 lineshapes.py:63(voigt)
 1050/150    0.001    0.000    0.012    0.000 asteval.py:581(on_binop)
  300/225    0.001    0.000    0.008    0.000 asteval.py:744(on_call)
     8364    0.001    0.000    0.001    0.000 {built-in method builtins.isinstance}
      156    0.001    0.000    0.001    0.000 {built-in method builtins.compile}
      412    0.001    0.000    0.013    0.000 parameter.py:740(_getval)

That suggests to me that the fit is spending some time evaluating the constraint expressions (which for Voigt are sort of complicated for fwhm and height), but it isn't dominating the runtime. I think you have more Voigt functions and a lot more function evaluations, so it might be more significant.

If I explicitly simplify the constraint expressions to be incorrect with

mod = VoigtModel()
mod.param_hints['fwhm']['expr'] = 'sigma'
mod.param_hints['height']['expr'] = 'amplitude'

pars = mod.guess(y, x=x)
pars['gamma'].set(value = 2, vary=True, expr=None)
cProfile.run("out= mod.fit(y, pars, x=x)", sort=1)

then I see a profiling output of

16723 function calls (16361 primitive calls) in 0.045 seconds
Ordered by: internal time
 ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      3    0.022    0.007    0.022    0.007 {built-in method numpy.dot}
     59    0.005    0.000    0.005    0.000 lineshapes.py:63(voigt)
450/150    0.001    0.000    0.002    0.000 asteval.py:279(run)
    412    0.001    0.000    0.004    0.000 parameter.py:740(_getval)

so it isn't making as many calls to asteval but it doesn't seeem to run a lot faster either (FWIW, the number of function evals are the same).

I might suggest trying a similar strategy, perhaps removing the parameter hints for height and fwhm, perhaps with

 mod.param_hints.pop('fwhm')
 mod.param_hints.pop('height')

and seeing if that improves your run time.

I suspect that understanding why your fit is taking so many iterations might be more useful. If you have multiple Voigt peaks, you may want to check whether they are swapping places or overlapped...

M Newville
  • 7,486
  • 2
  • 16
  • 29
  • I'm still writing up a decent mwe. For this I would like to know: do you see a way to implement a "global" peak fit using like i did above without using minimize()? is it as simple as mod = Model(objective) and then running mod.fit(y, params, x=x)? – user99533 Apr 23 '19 at 16:26
  • Sorry, not sure I understand the question. A model function is different from an objective function in that the model function returns an array meant to match a data array, whereas objective function returns an array to be minimized. You can certainly build a Model with several peaks and/or other functions. The docs and examples show how to do this. But: none of that is relevant to your question here. SO is not a general or free-form support forum. Stick to the question at hand. You asked about the results of profiling and did not include those results. Fix that. – M Newville Apr 24 '19 at 12:00
  • Thanks for your answer! Added the relevant output to my Question, sorry for the delay. Can you please explain to me how to progress on the "how to use Model for global fitting" question? Open a new thread? – user99533 Apr 25 '19 at 14:02
  • @M Newville I just found the Mailing list, where I see you had a discussion about this before ([here](https://groups.google.com/forum/#!searchin/lmfit-py/fitting$20several$20datasets$20with$20common$20parameters%7Csort:date/lmfit-py/sX-xuu0BXdU/MXXdhSFvBwAJ)). I will read up on that - sorry for my misbehavior, was not aware of your mailing list – user99533 Apr 25 '19 at 14:30
  • Thanks for adding the profiling data . I don't see the same sort of results as you with a simple problem, but I'll update my answer. – M Newville Apr 25 '19 at 17:29