How can I fit multiple Gaussian curved to mass spectrometry data in Python?

Question

I would like to fit multiple Gaussian curves to Mass spectrometry data in Python. Right now I'm fitting the data one Gaussian at a time -- literally one range at a time.

Is there a more streamlined way to do this? Is there a way I can run the data through a loop to plot a Gaussian at each peak? I'm guessing there's gotta be a better way, but I've combed through the internet.

My graph for two Gaussians is shown below.

Mass Spectrometry py.plot with two Gaussian Fits

My example data can be found at: http://txt.do/dooxv

And here's my current code:

import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as opt

from scipy.interpolate import interp1d

RGAdata = np.loadtxt("/Users/ilenemitchell/Desktop/RGAscan.txt", skiprows=14)
RGAdata=RGAdata.transpose()

x=RGAdata[0]
y=RGAdata[1]

# graph labels
plt.ylabel('ion current')
plt.xlabel('mass/charge ratio')
plt.xticks(np.arange(min(RGAdata[0]), max(RGAdata[0])+2, 2.0))
plt.ylim([10**-12.5, 10**-9])
plt.title('RGA Data Jul 25, 2017')

plt.semilogy(x, y,'b')

#fitting a guassian to a peak

def gauss(x, a, mu, sig):
return a*np.exp(-(x-mu)**2/(2*sig**2))


fitx=x[(x>40)*(x<43)]
fity=y[(x>40)*(x<43)]
mu=np.sum(fitx*fity)/np.sum(fity)
sig=np.sqrt(np.sum(fity*(fitx-mu)**2)/np.sum(fity))

print (mu, sig, max(fity))

popt, pcov = opt.curve_fit(gauss, fitx, fity, p0=[max(fity),mu, sig])
plt.semilogy(x, gauss(x, popt[0],popt[1],popt[2]), 'r-', label='fit')

#second guassian

fitx2=x[(x>26)*(x<31)]
fity2=y[(x>26)*(x<31)]
mu=np.sum(fitx2*fity2)/np.sum(fity2)
sig=np.sqrt(np.sum(fity2*(fitx2-mu)**2)/np.sum(fity2))

print (mu, sig, max(fity2))

popt2, pcov2 = opt.curve_fit(gauss, fitx2, fity2, p0=[max(fity2),mu, sig])
plt.semilogy(x, gauss(x, popt2[0],popt2[1],popt2[2]), 'm', label='fit2')

plt.show()

Could you provde some example data please? Also, could you show your image maybe with arrows to indicate what exactly you would like to be highlighted with Gaussian fits? — fsimkovic, Jul 28 '17 at 18:06
Sure thing. I just updated the photo (linked above). And I also uploaded a link to example data. Thx — MsPhyz, Jul 28 '17 at 18:21
You have to come up with a way to identify peaks and their surrounding ranges, most likely using a rolling window technique. Once you have that function written, you can cycle through the entire data set. — Alex F, Jul 28 '17 at 18:29
Hmmm okay not sure what a rolling window function is...but for now I just defined a variable with given ranges (which is fine this time, because the peaks are countable. But does the "rolling window function create an array and guess the ranges for you? — MsPhyz, Jul 28 '17 at 21:15

score 0 · Answer 1 · edited Jul 28 '17 at 18:46

Here's some sample code of identifying peaks in a data set to get you started. You can find a link to all the examples here.

import numpy as np
import peakutils
cb = np.array([-0.010223, ... ])
indexes = peakutils.indexes(cb, thres=0.02/max(cb), min_dist=100)
# [ 333  693 1234 1600]

interpolatedIndexes = peakutils.interpolate(range(0, len(cb)), cb, ind=indexes)
# [  332.61234263   694.94831376  1231.92840845  1600.52446335]

score 0 · Answer 2 · answered Jul 28 '17 at 18:44

In addition to Alex F's answer, you need to identify peaks and analyze their surroundings to identify the xmin and xmax values.

If you have done that, you can use this slightly refactored code and the loop within to plot all relevant data

import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as opt

from scipy.interpolate import interp1d

def _gauss(x, a, mu, sig):
    return a*np.exp(-(x-mu)**2/(2*sig**2))

def gauss(x, y, xmin, xmax):
    fitx = x[(x>xmin)*(x<xmax)]
    fity = y[(x>xmin)*(x<xmax)]
    mu = np.sum(fitx*fity)/np.sum(fity)
    sig = np.sqrt(np.sum(fity*(fitx-mu)**2)/np.sum(fity))

    print (mu, sig, max(fity))

    popt, pcov = opt.curve_fit(_gauss, fitx, fity, p0=[max(fity), mu, sig])
    return _gauss(x, popt[0], popt[1], popt[2])

# Load data and define x - y
RGAdata = np.loadtxt("/Users/ilenemitchell/Desktop/RGAscan.txt", skiprows=14)
x, y = RGAdata.T

# Create the plot
fig, ax = plt.subplots()
ax.semilogy(x, y, 'b')

# Plot the Gaussian's between xmin and xmax
for xmin, xmax in [(40, 43), (26, 31)]:
    yG = gauss(x, y, xmin, xmax)
    ax.semilogy(x, yG)

# Prettify the graph
ax.set_xlabel("mass/charge ratio")
ax.set_ylabel("ion current")
ax.set_xticks(np.arange(min(x), max(x)+2, 2.0))
ax.set_ylim([10**-12.5, 10**-9])
ax.set_title("RGA Data Jul 25, 2017")

plt.show()

score 0 · Answer 3 · answered Jul 30 '17 at 23:33

You may find the lmfit module (https://lmfit.github.io/lmfit-py/) helpful. This provides a pre-built GaussianModel class for fitting a peak to a single Gaussian and supports adding multiple Models (not necessarily Gaussians, but also other peak models and other functions that might be useful for backgrounds and so for) into a composite model that can be fit at once.

Lmfit supports fixing or giving a range to some Parameters, so that you could build a model as a sum of Gaussians with fixed positions, limiting the value for the centroid to vary with some range (so the peaks cannot get confused). In addition, you can impose simple mathematical constraints on parameter values, so that you might require that all peak widths are the same size (or related in some simple form).

In particular, you might look to https://lmfit.github.io/lmfit-py/builtin_models.html#example-3-fitting-multiple-peaks-and-using-prefixes for an example a fit using 2 Gaussians and a background function.

For peak finding, I've found scipy.signal.find_peaks_cwt to be pretty good.

How can I fit multiple Gaussian curved to mass spectrometry data in Python?

3 Answers3

Linked