35

I have been trying to figure out the full width half maximum (FWHM) of the the blue peak (see image). The green peak and the magenta peak combined make up the blue peak. I have been using the following equation to find the FWHM of the green and magenta peaks: fwhm = 2*np.sqrt(2*(math.log(2)))*sd where sd = standard deviation. I created the green and magenta peaks and I know the standard deviation which is why I can use that equation.

I created the green and magenta peaks using the following code:

def make_norm_dist(self, x, mean, sd):
    import numpy as np

    norm = []
    for i in range(x.size):
        norm += [1.0/(sd*np.sqrt(2*np.pi))*np.exp(-(x[i] - mean)**2/(2*sd**2))]
    return np.array(norm) 

If I did not know the blue peak was made up of two peaks and I only had the blue peak in my data, how would I find the FWHM?

I have been using this code to find the peak top:

peak_top = 0.0e-1000
for i in x_axis:
    if i > peak_top:
        peak_top = i

I could divide the peak_top by 2 to find the half height and then try and find y-values corresponding to the half height, but then I would run into trouble if there are no x-values exactly matching the half height.

I am pretty sure there is a more elegant solution to the one I am trying.

Harpal
  • 12,057
  • 18
  • 61
  • 74
  • 2
    Why not just calculate the standard deviation of the blue peak and use your equation that relates the FWHM to the standard deviation? – Justin Peel May 14 '12 at 12:46

9 Answers9

25

You can use spline to fit the [blue curve - peak/2], and then find it's roots:

import numpy as np
from scipy.interpolate import UnivariateSpline

def make_norm_dist(x, mean, sd):
    return 1.0/(sd*np.sqrt(2*np.pi))*np.exp(-(x - mean)**2/(2*sd**2))

x = np.linspace(10, 110, 1000)
green = make_norm_dist(x, 50, 10)
pink = make_norm_dist(x, 60, 10)

blue = green + pink   

# create a spline of x and blue-np.max(blue)/2 
spline = UnivariateSpline(x, blue-np.max(blue)/2, s=0)
r1, r2 = spline.roots() # find the roots

import pylab as pl
pl.plot(x, blue)
pl.axvspan(r1, r2, facecolor='g', alpha=0.5)
pl.show()

Here is the result:

enter image description here

HYRY
  • 94,853
  • 25
  • 187
  • 187
16

This worked for me in iPython (quick and dirty, can be reduced to 3 lines):

def FWHM(X,Y):
    half_max = max(Y) / 2.
    #find when function crosses line half_max (when sign of diff flips)
    #take the 'derivative' of signum(half_max - Y[])
    d = sign(half_max - array(Y[0:-1])) - sign(half_max - array(Y[1:]))
    #plot(X[0:len(d)],d) #if you are interested
    #find the left and right most indexes
    left_idx = find(d > 0)[0]
    right_idx = find(d < 0)[-1]
    return X[right_idx] - X[left_idx] #return the difference (full width)

Some additions can be made to make the resolution more accurate, but in the limit that there are many samples along the X axis and the data is not too noisy, this works great.

Even when the data are not Gaussian and a little noisy, it worked for me (I just take the first and last time half max crosses the data).

FizxMike
  • 971
  • 1
  • 10
  • 16
  • 2
    Quick&dirty but very useful. The first improvement would be a linear interpolation at each end I guess. Note that `d` seems to end up with 1 item less in it than Y, so plotting `d` against `X` doesn't work - you need to plot `d` against `X[0:len(d)]` – Chris H Jan 24 '14 at 17:21
  • 4
    The numpy array method this answer uses should be much higher. I'm finding FWHM values of 441 x-ray diffraction peaks to create a map and it's orders of magnitude faster than the UnivariateSpline method. I ended reducing it down to three lines: `d = Y - (max(Y) / frac)` `indexes = np.where(d > 0)[0]` `return abs(X[indexes[-1]] - X[indexes[0]])` # where frac is the integer that you want to find the data separation at. For FWHM frac = 2, FWtenthM, frac = 10, etc. – zeppelin_d Jun 22 '16 at 15:39
  • How can this be used in multi-peak datasets? – FaCoffee Aug 02 '18 at 16:01
  • d is an array of -1 or +1, just find distance between adjacent pairs of sign flips... – FizxMike Aug 02 '18 at 21:52
  • 2
    I'm just wondering where is `find` function coming from? It seems not a built-in function and also not in `numpy`. – Darren Christopher Jun 15 '20 at 04:26
  • @DarrenChristopher looks like find() should be replaced with numpy.where() – Corey Kelly Dec 16 '20 at 03:45
12

If your data has noise (and it always does in the real world), a more robust solution would be to fit a Gaussian to the data and extract FWHM from that:

import numpy as np
import scipy.optimize as opt

def gauss(x, p): # p[0]==mean, p[1]==stdev
    return 1.0/(p[1]*np.sqrt(2*np.pi))*np.exp(-(x-p[0])**2/(2*p[1]**2))

# Create some sample data
known_param = np.array([2.0, .7])
xmin,xmax = -1.0, 5.0
N = 1000
X = np.linspace(xmin,xmax,N)
Y = gauss(X, known_param)

# Add some noise
Y += .10*np.random.random(N)

# Renormalize to a proper PDF
Y /= ((xmax-xmin)/N)*Y.sum()

# Fit a guassian
p0 = [0,1] # Inital guess is a normal distribution
errfunc = lambda p, x, y: gauss(x, p) - y # Distance to the target function
p1, success = opt.leastsq(errfunc, p0[:], args=(X, Y))

fit_mu, fit_stdev = p1

FWHM = 2*np.sqrt(2*np.log(2))*fit_stdev
print "FWHM", FWHM

enter image description here

The plotted image can be generated by:

from pylab import *
plot(X,Y)
plot(X, gauss(X,p1),lw=3,alpha=.5, color='r')
axvspan(fit_mu-FWHM/2, fit_mu+FWHM/2, facecolor='g', alpha=0.5)
show()

An even better approximation would filter out the noisy data below a given threshold before the fit.

Hooked
  • 84,485
  • 43
  • 192
  • 261
  • Generally speaking you shouldn't filter noisy data before fitting - though removal of a background may be a good idea. This is because any useful filtering has a real chance of distorting the data. – Chris H Jan 24 '14 at 16:53
  • Can this be applied to a multi-peak signal? – FaCoffee Aug 02 '18 at 15:52
  • 1
    @FaCoffee it could, but you'd have to change the objective function. In the simple case, if you knew that there were say _n_ peaks, you could adjust `errfunc` in the code above to fit N Gaussians. If you don't know the number of fits beforehand, it get trickier and it probably would warrant a new question on this site. – Hooked Aug 03 '18 at 13:54
8

Here is a nice little function using the spline approach.

from scipy.interpolate import splrep, sproot, splev

class MultiplePeaks(Exception): pass
class NoPeaksFound(Exception): pass

def fwhm(x, y, k=10):
    """
    Determine full-with-half-maximum of a peaked set of points, x and y.

    Assumes that there is only one peak present in the datasset.  The function
    uses a spline interpolation of order k.
    """

    half_max = amax(y)/2.0
    s = splrep(x, y - half_max, k=k)
    roots = sproot(s)

    if len(roots) > 2:
        raise MultiplePeaks("The dataset appears to have multiple peaks, and "
                "thus the FWHM can't be determined.")
    elif len(roots) < 2:
        raise NoPeaksFound("No proper peaks were found in the data set; likely "
                "the dataset is flat (e.g. all zeros).")
    else:
        return abs(roots[1] - roots[0])
jdg
  • 2,230
  • 2
  • 20
  • 18
3

You should use scipy to solve it: first find_peaks and then peak_widths. With default value in rel_height(0.5) you're measuring the width at half maximum of the peak.

eduardosufan
  • 1,441
  • 2
  • 23
  • 51
1

If you prefer interpolation over fitting:

import numpy as np

def get_full_width(x: np.ndarray, y: np.ndarray, height: float = 0.5) -> float:
    height_half_max = np.max(y) * height
    index_max = np.argmax(y)
    x_low = np.interp(height_half_max, y[:index_max+1], x[:index_max+1])
    x_high = np.interp(height_half_max, np.flip(y[index_max:]), np.flip(x[index_max:]))

    return x_high - x_low

basil_man
  • 322
  • 5
  • 14
0

For monotonic functions with many data points and if there's no need for perfect accuracy, I would use:

def FWHM(X, Y):
    deltax = x[1] - x[0]
    half_max = max(Y) / 2.
    l = np.where(y > half_max, 1, 0)

    return np.sum(l) * deltax
Neuron
  • 5,141
  • 5
  • 38
  • 59
PINGU
  • 1
  • 3
0

I implemented an empirical solution which works for noisy and not-quite-Gaussian data fairly well in haggis.math.full_width_half_max. The usage is extremely straightforward:

fwhm = full_width_half_max(x, y)

The function is robust: it simply finds the maximum of the data and the nearest points crossing the "halfway down" threshold using the requested interpolation scheme.

Here are a couple of examples using data from the other answers.

@HYRY's smooth data

def make_norm_dist(x, mean, sd):
    return 1.0/(sd*np.sqrt(2*np.pi))*np.exp(-(x - mean)**2/(2*sd**2))

x = np.linspace(10, 110, 1000)
green = make_norm_dist(x, 50, 10)
pink = make_norm_dist(x, 60, 10)

blue = green + pink   

# create a spline of x and blue-np.max(blue)/2 
spline = UnivariateSpline(x, blue-np.max(blue)/2, s=0)
r1, r2 = spline.roots() # find the roots

# Compute using my function
fwhm, (x1, y1), (x2, y2) = full_width_half_max(x, blue, return_points=True)

# Print comparison
print('HYRY:', r2 - r1, 'MP:', fwhm)

plt.plot(x, blue)
plt.axvspan(r1, r2, facecolor='g', alpha=0.5)
plt.plot(x1, y1, 'r.')
plt.plot(x2, y2, 'r.')

For smooth data, the results are pretty exact:

HYRY: 26.891157007233254 MP: 26.891193606203814

enter image description here

@Hooked's Noisy Data

def gauss(x, p): # p[0]==mean, p[1]==stdev
    return 1.0/(p[1]*np.sqrt(2*np.pi))*np.exp(-(x-p[0])**2/(2*p[1]**2))

# Create some sample data
known_param = np.array([2.0, .7])
xmin,xmax = -1.0, 5.0
N = 1000
X = np.linspace(xmin,xmax,N)
Y = gauss(X, known_param)

# Add some noise
Y += .10*np.random.random(N)

# Renormalize to a proper PDF
Y /= ((xmax-xmin)/N)*Y.sum()

# Fit a guassian
p0 = [0,1] # Inital guess is a normal distribution
errfunc = lambda p, x, y: gauss(x, p) - y # Distance to the target function
p1, success = opt.leastsq(errfunc, p0[:], args=(X, Y))

fit_mu, fit_stdev = p1

FWHM = 2*np.sqrt(2*np.log(2))*fit_stdev

# Compute using my function
fwhm, (x1, y1), (x2, y2) = full_width_half_max(X, Y, return_points=True)

# Print comparison
print('Hooked:', FWHM, 'MP:', fwhm)

plt.plot(X, Y)
plt.plot(X, gauss(X, p1), lw=3, alpha=.5, color='r')
plt.axvspan(fit_mu - FWHM / 2, fit_mu + FWHM / 2, facecolor='g', alpha=0.5)
plt.plot(x1, y1, 'r.')
plt.plot(x2, y2, 'r.')

For noisy data (with a biased baseline), the results are not as consistent.

Hooked: 1.9903193212254346 MP: 1.5039676990530118

On the one hand the Gaussian fit is not very optimal for the data, but on the other hand, the strategy of picking the nearest point that intersects the half-max threshold is likely not optimal either.

enter image description here

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
0

An efficient python implementation is where values is a list:

def calculate_FWHM(values):

    # Find the maximum value and its index
    max_value = max(values)
    max_index = values.index(max_value)

    # Find the value that is half the maximum
    half_max = max_value / 2

    # Find the indices where the values are closest to half the maximum on both sides of the peak
    left_index =  next((i for i in range(max_index, -1, -1) if values[i] <= half_max), 0)
    right_index = next((i for i in range(max_index, len(values)) if values[i] <= half_max), len(values) - 1)

    # return the FWHM
    return right_index - left_index
seralouk
  • 30,938
  • 9
  • 118
  • 133