7

enter image description here

Operators used to examine the spectrum, knowing the location and width of each peak and judge the piece the spectrum belongs to. In the new way, the image is captured by a camera to a screen. And the width of each band must be computed programatically.

Old system: spectroscope -> human eye New system: spectroscope -> camera -> program

What is a good method to compute the width of each band, given their approximate X-axis positions; given that this task used to be performed perfectly by eye, and must now be performed by program?

Sorry if I am short of details, but they are scarce.


Program listing that generated the previous graph; I hope it is relevant:

import Image
from scipy import *
from scipy.optimize import leastsq

# Load the picture with PIL, process if needed
pic         = asarray(Image.open("spectrum.jpg"))

# Average the pixel values along vertical axis
pic_avg     = pic.mean(axis=2)
projection  = pic_avg.sum(axis=0)

# Set the min value to zero for a nice fit
projection /= projection.mean()
projection -= projection.min()

#print projection

# Fit function, two gaussians, adjust as needed
def fitfunc(p,x):
    return p[0]*exp(-(x-p[1])**2/(2.0*p[2]**2)) + \
        p[3]*exp(-(x-p[4])**2/(2.0*p[5]**2))
errfunc = lambda p, x, y: fitfunc(p,x)-y

# Use scipy to fit, p0 is inital guess
p0 = array([0,20,1,0,75,10])
X  = xrange(len(projection))
p1, success = leastsq(errfunc, p0, args=(X,projection))
Y = fitfunc(p1,X)

# Output the result
print "Mean values at: ", p1[1], p1[4]

# Plot the result
from pylab import *
#subplot(211)
#imshow(pic)
#subplot(223)
#plot(projection)
#subplot(224)
#plot(X,Y,'r',lw=5)
#show()

subplot(311)
imshow(pic)
subplot(312)
plot(projection)
subplot(313)
plot(X,Y,'r',lw=5)
show()
Jesvin Jose
  • 22,498
  • 32
  • 109
  • 202
  • 5
    Find the threshold at which the human eye can't distinguish the color from the background. It should be fairly constant. Then, just threshold your data points such that the ones above that threshold would be "seen" by the human eye and just cluster those data points and find the width of each cluster. – Blender May 26 '12 at 07:56
  • Maybe take the derivative of the curve to find peaks and slopes easily? Aren't the width of the bands pretty constant? It's the amplitudes and positions that vary as I understand it. – Lennart Regebro May 26 '12 at 08:17
  • @Blender, could you elaborate on your point? – Jesvin Jose May 26 '12 at 09:01
  • Is the noise floor reasonably flat? How do you determine (by eye) what's a peak and what's too low? For example, is that blip at about 660–670 a doublet, or would it not be counted? What about at 740? – detly May 26 '12 at 09:05
  • @detly, I am not perfectly informed, but the position of the peaks would be known beforehand. And yes, I am concerned if two known peaks are too close to each other. – Jesvin Jose May 26 '12 at 09:22

3 Answers3

3

Given an approximate starting point, you could use a simple algorithm that finds a local maxima closest to this point. Your fitting code may be doing that already (I wasn't sure whether you were using it successfully or not).

Here's some code that demonstrates simple peak finding from a user-given starting point:

#!/usr/bin/env python
from __future__ import division
import numpy as np
from matplotlib import pyplot as plt

# Sample data with two peaks: small one at t=0.4, large one at t=0.8
ts = np.arange(0, 1, 0.01)
xs = np.exp(-((ts-0.4)/0.1)**2) + 2*np.exp(-((ts-0.8)/0.1)**2)

# Say we have an approximate starting point of 0.35
start_point = 0.35

# Nearest index in "ts" to this starting point is...
start_index = np.argmin(np.abs(ts - start_point))

# Find the local maxima in our data by looking for a sign change in
# the first difference
# From http://stackoverflow.com/a/9667121/188535
maxes = (np.diff(np.sign(np.diff(xs))) < 0).nonzero()[0] + 1

# Find which of these peaks is closest to our starting point
index_of_peak = maxes[np.argmin(np.abs(maxes - start_index))]

print "Peak centre at: %.3f" % ts[index_of_peak]

# Quick plot showing the results: blue line is data, green dot is
# starting point, red dot is peak location
plt.plot(ts, xs, '-b')
plt.plot(ts[start_index], xs[start_index], 'og')
plt.plot(ts[index_of_peak], xs[index_of_peak], 'or')
plt.show()

This method will only work if the ascent up the peak is perfectly smooth from your starting point. If this needs to be more resilient to noise, I have not used it, but PyDSTool seems like it might help. This SciPy post details how to use it for detecting 1D peaks in a noisy data set.

So assume at this point you've found the centre of the peak. Now for the width: there are several methods you could use, but the easiest is probably the "full width at half maximum" (FWHM). Again, this is simple and therefore fragile. It will break for close double-peaks, or for noisy data.

The FWHM is exactly what its name suggests: you find the width of the peak were it's halfway to the maximum. Here's some code that does that (it just continues on from above):

# FWHM...
half_max = xs[index_of_peak]/2

# This finds where in the data we cross over the halfway point to our peak. Note
# that this is global, so we need an extra step to refine these results to find
# the closest crossovers to our peak.

# Same sign-change-in-first-diff technique as above
hm_left_indices = (np.diff(np.sign(np.diff(np.abs(xs[:index_of_peak] - half_max)))) > 0).nonzero()[0] + 1
# Add "index_of_peak" to result because we cut off the left side of the data!
hm_right_indices = (np.diff(np.sign(np.diff(np.abs(xs[index_of_peak:] - half_max)))) > 0).nonzero()[0] + 1 + index_of_peak

# Find closest half-max index to peak
hm_left_index = hm_left_indices[np.argmin(np.abs(hm_left_indices - index_of_peak))]
hm_right_index = hm_right_indices[np.argmin(np.abs(hm_right_indices - index_of_peak))]

# And the width is...    
fwhm = ts[hm_right_index] - ts[hm_left_index]

print "Width: %.3f" % fwhm

# Plot to illustrate FWHM: blue line is data, red circle is peak, red line
# shows FWHM
plt.plot(ts, xs, '-b')
plt.plot(ts[index_of_peak], xs[index_of_peak], 'or')
plt.plot(
    [ts[hm_left_index], ts[hm_right_index]],
    [xs[hm_left_index], xs[hm_right_index]], '-r')
plt.show()

It doesn't have to be the full width at half maximum — as one commenter points out, you can try to figure out where your operators' normal threshold for peak detection is, and turn that into an algorithm for this step of the process.

A more robust way might be to fit a Gaussian curve (or your own model) to a subset of the data centred around the peak — say, from a local minima on one side to a local minima on the other — and use one of the parameters of that curve (eg. sigma) to calculate the width.

I realise this is a lot of code, but I've deliberately avoided factoring out the index-finding functions to "show my working" a bit more, and of course the plotting functions are there just to demonstrate.

Hopefully this gives you at least a good starting point to come up with something more suitable to your particular set.

detly
  • 29,332
  • 18
  • 93
  • 152
  • Could you include a modification that throws an exception if the "base" is lower than the threshold (0.5 of peak in this case)? – Jesvin Jose Jun 07 '12 at 06:08
  • @aitchnyu Well, here I've made a bit of an oversimplification: I just said 0.5 of total height, but FWHM is better measured as half of the height of the peak *above the base* (or noise floor). So it doesn't make sense for the base to be lower than 0.5 of the peak. – detly Jun 08 '12 at 02:36
  • Yes, and I want the algorithm to throw an exception if it meets an ill-formed peak: a peak which doesnt fall monotonically at upto 0.5 peak value (see the doublet in my graph). And I need **only** the width, the amplitude can change due to the instrument's properties. The "noise floor" should be ideally silent, corresponding to the intensity of black. – Jesvin Jose Jun 08 '12 at 05:18
  • @aitchnyu Ah, I get you. I don't have time to do it myself, but you could implement something that looks at the first difference over the range of the peak. It should be strictly positive to the left of the peak, and strictly negative to the right. By definition, it's zero right at the peak, although maybe you should allow for zero values on either side too. – detly Jun 08 '12 at 05:20
  • So `hm_left_index:index_of_peak` of the first difference must be all positive and`index_of_peak:hm_right_index` must be all negative, right? – Jesvin Jose Jun 08 '12 at 05:30
  • @aitchnyu Yes (or maybe `>= 0`). In the first difference though, not the actual values. – detly Jun 08 '12 at 06:41
2

Late to the party, but for anyone coming across this question in the future...

Eye movement data looks very similar to this; I'd base an approach off that used by Nystrom + Holmqvist, 2010. Smooth the data using a Savitsky-Golay filter (scipy.signal.savgol_filter in scipy v0.14+) to get rid of some of the low-level noise while keeping the large peaks intact - the authors recommend using an order of 2 and a window size of about twice the width of the smallest peak you want to be able to detect. You can find where the bands are by arbitrarily removing all values above a certain y value (set them to numpy.nan). Then take the (nan)mean and (nan)standard deviation of the remainder, and remove all values greater than the mean + [parameter]*std (I think they use 6 in the paper). Iterate until you're not removing any data points - but depending on your data, certain values of [parameter] may not stabilise. Then use numpy.isnan() to find events vs non-events, and numpy.diff() to find the start and end of each event (values of -1 and 1 respectively). To get even more accurate start and end points, you can scan along the data backward from each start and forward from each end to find the nearest local minimum which has value smaller than mean + [another parameter]*std (I think they use 3 in the paper). Then you just need to count the data points between each start and end.

This won't work for that double peak; you'd have to do some extrapolation for that.

Chris L. Barnes
  • 475
  • 4
  • 13
0

The best method might be to statistically compare a bunch of methods with human results.

You would take a large variety data and a large variety of measurement estimates (widths at various thresholds, area above various thresholds, different threshold selection methods, 2nd moments, polynomial curve fits of various degrees, pattern matching, and etc.) and compare these estimates to human measurements of the same data set. Pick the estimate method that correlates best with expert human results. Or maybe pick several methods, the best one for each of various heights, for various separations from other peaks, and etc.

hotpaw2
  • 70,107
  • 14
  • 90
  • 153