0

I recently started with Python because I have an enormous amount of data where I want to automatically fit a Gaussian to the peaks in spectra. Below is an example of three peaks that I want to fit with three individual peaks.

I have found a question where someone is looking for something very similar, How can I fit multiple Gaussian curved to mass spectrometry data in Python?, and adopted it to my script.

I have added my code at the bottom and when I run the last section I get the error "RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 800." What am I missing?

The data can be downloaded at https://www.dropbox.com/s/zowawljcjco70yh/data_so.h5?dl=0

enter image description here

#%%
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import sparse
from scipy.sparse.linalg import spsolve
from scipy.optimize import curve_fit
#%% Read data
path = 'D:/Python/data_so.h5'
f = pd.read_hdf(path, mode = 'r')
t = f.loc[:, 'Time stamp']
d = f.drop(['Time stamp', 'Name spectrum'], axis = 1)
#%% Extract desired wavenumber range
wn_st=2000
wn_ed=2500
ix_st=np.argmin(abs(d.columns.values-wn_st))
ix_ed=np.argmin(abs(d.columns.values-wn_ed))
d = d.iloc[:, ix_st:ix_ed+1]
#%% AsLS baseline correction
spectrum = 230
y = d.iloc[spectrum]

niter = 10
lam = 200000
p = 0.005

L = len(y)
D = sparse.diags([1,-2,1],[0,-1,-2], shape=(L,L-2))
w = np.ones(L)
for i in range(niter):
    W = sparse.spdiags(w, 0, L, L)
    Z = W + lam * D.dot(D.transpose())
    z = spsolve(Z, w*y)
    w = p * (y > z) + (1-p) * (y < z)

corr = d.iloc[spectrum,:] - z
#%% Plot spectrum, baseline and corrected spectrum
plt.clf()
plt.plot(d.columns, d.iloc[spectrum,:])
plt.plot(d.columns, z)
plt.plot(d.columns, corr)
plt.gca().invert_xaxis()
plt.show()
#%%
x = d.columns.values
def gauss(x, a, mu, sig):
    return a*np.exp(-(x.astype(float)-mu)**2/(2*sig**2))

fitx = x[(x>2232)*(x<2252)]
fity = y[(x>2232)*(x<2252)]
mu=np.sum(fitx*fity)/np.sum(fity)
sig=np.sqrt(np.sum(fity*(fitx-mu)**2)/np.sum(fity))

popt, pcov = curve_fit(gauss, fitx, fity, p0=[max(fity),mu, sig])
plt.plot(x, gauss(x, popt[0],popt[1],popt[2]), 'r-', label='fit')
Terranees
  • 141
  • 1
  • 2
  • 12
  • You are trying to fit to too much of the data. Try truncating between 2240 and 2300 along the x axis. The long flat areas are overwhelming the fit capacity. – James Oct 18 '18 at 10:56
  • @James I am passing fitx and fity to the curve_fit function and they range from 2232 to 2252. Should I truncate earlier in the script? – Terranees Oct 18 '18 at 12:07
  • I have an example of fitting a double Lorentzian peak equation to Raman spectroscopy of carbon nanotubes here: https://bitbucket.org/zunzuncode/RamanSpectroscopyFit - the example uses scipy's differential_evolution genetic algorithm module to estimate initial parameters for the non-linear fitter, which implements the Latin Hypercube algorithm to ensure a thorough search of parameter space. The Latin Hypercube algorithm requires parameter bounds within which to search - in the example code, these bounds are determined from the data max and min values. – James Phillips Oct 18 '18 at 13:35
  • What are the starting values for your parameters, and what are the resulting values and uncertainties? Why do you need to do `x.astype(float)` but not the same on the `fity` values? I would think you would not need to truncate the data. And the way you're doing that makes it look like a very different thing from the plots you are making. Try plotting exactly the fitx and fity data that you fit (you plot something else entirely for "data"), and plot of "best fit" using `fitx`, not `x`. – M Newville Oct 19 '18 at 21:26

0 Answers0