Robust algorithm for detection of peak widths

Question

enter image description here

I asked how to programmatically judge spectrum bands and @detly suggested using FWHM (full width at half maximum) to determine the widths of the peaks. I searched around and found that FWHM can be used for fitting models (I am practically a layman to this!), especially Guassian models. Specifically, 2.354 * sigma is the width for a Guassian model.

I am looking at a Guassian fit because of the presence of bad peaks. There are 4 well-formed peaks in this picture. Then there is the "double" peak (while both may be important) and two spread-out peaks. They will prove an impossible challenge to a naive FWHM.

Can you help with generating a guassian fitting (for purpose of FWHM calculation) of a peak in Scip/Numpy, given it's approximate location of its x coordinate? If Guassian is a bad choice, then some other scheme.

If you have a lot of this data and want a faster way to get results try root (root.cern.ch). It is a C++ framework specifically designed for data analysis (and in your case: fit routines). — BandGap, Jun 06 '12 at 09:32

fraxel · Answer 1 · 2012-06-04T17:56:50.320

Fitting Gaussians is a good approach. And if you have okish guesses to the initial parameter values you can try and guess them all at once. A big problem is noise, really you probably want to either fit each peak in isolation (ie. only consider the range that a given peak is in at a time), or get a base line noise curve across your data and subtract it.

Here is some code that trys to fit multiple Gaussians. I've entered some pretty loose parameters for what I considered to be the 8 most prominent peaks, plus one extra very broad Guassian to try and capture the background noise. Before processing I cleaned your posted image a little, to help get the data out of it (removed the mouse cursor and axis edges, and inverted the image).

enter image description here

code:

import Image
from scipy import *
from scipy.optimize import leastsq
import numpy as np

im = Image.open("proc.png")
im = im.convert('L')
h, w = im.size
#Extract data from the processed image:
im = np.asarray(im)
y_vals, junk  = np.mgrid[w:0:-1, h:0:-1]
y_vals[im < 255] = 0
y_vals = np.amax(y_vals,axis=0)

def gaussian(x, A, x0, sig):
    return A*exp(-(x-x0)**2/(2.0*sig**2))

def fit(p,x):
    return np.sum([gaussian(x, p[i*3],p[i*3+1],p[i*3+2]) 
                   for i in xrange(len(p)/3)],axis=0)

err = lambda p, x, y: fit(p,x)-y

#params are our intitial guesses for fitting gaussians, 
#(Amplitude, x value, sigma):
params = [[50,40,5],[50,110,5],[100,160,5],[100,220,5],
          [50,250,5],[100,260,5],[100,320,5], [100,400,5],   
          [30,300,150]]  # this last one is our noise estimate
params = np.asarray(params).flatten()

x  = xrange(len(y_vals))
results, value = leastsq(err, params, args=(x, y_vals))

for res in results.reshape(-1,3):
    print "amplitude, position, sigma", res

import pylab
pylab.subplot(211, title='original data')
pylab.plot(y_vals)
pylab.subplot(212, title='guassians fit')
y = fit(results, x)
pylab.plot(x, y)
pylab.savefig('fig2.png')
pylab.show()

These are the fitted output Guassian parameters:

#Output:
amplitude, position, sigma [ 23.47418352  41.27086097   5.91012897]
amplitude, position, sigma [  16.26370263  106.39664543    3.67827824]
amplitude, position, sigma [  59.74500239  163.11210316    2.89866545]
amplitude, position, sigma [  57.91752456  220.24444939    2.91145375]
amplitude, position, sigma [  39.78579841  251.25955921    2.74646334]
amplitude, position, sigma [  86.50061756  260.62004831    3.08367483]
amplitude, position, sigma [  79.26849867  319.64343319    3.57224402]
amplitude, position, sigma [ 127.97009966  399.27833126    3.14331212]
amplitude, position, sigma [  20.21224956  379.41066063  195.47122312]

How did you "get the data out of [the picture]"? Is it more advanced than reading? :) — BandGap, Jun 06 '12 at 09:29
@BandGap - First I edited the image with Gimp to remove the mouse cursor, and axis edges, then I inverted and thresholded the image to make it binary. Then the 4 lines after `#Extract data from the processed image:` in the code extract the data: by creating a mesh of row values, setting all these values to zero if the corresponding positions are zero in the image, then it takes the maximum row index in a column, resulting in the first plot 'original data' which corresponds to the posted image. — fraxel, Jun 06 '12 at 09:44
@aitchnyu - Yes. But, really I'd need some more sample datasets, and/or better understanding of what the data can look like, (especially extreme cases). Ie. the remit needs to be a bit clearer - for your data, you need to know absolutely what constitutes a peak, and what doesn't. I'm probably not going to have much time over the next few days, so it might be worth your while offering a bounty, to encourage someone to have a crack at it.. — fraxel, Jun 07 '12 at 21:18
I modified it for a single peak, approx peak location and assumed amplitude and SD. But some peaks tend to show extreme values. So it is less robust than a naive FWHM ( @detly answered me on http://stackoverflow.com/a/10764997/604511 ). Here is some sample data: http://pastebin.com/b3VVZ24w — Jesvin Jose, Jun 16 '12 at 05:23

Robust algorithm for detection of peak widths

1 Answers1