Finding peaks in noisy data with find_peaks_cwt

Question

I am trying to find the peaks in some very noisy data such as this:

Without understanding the terminology very well, I'm defining the peaks as narrow (width<30) and more than 100000 higher than the nearby area.

I'm trying to use scipy's find_peaks_cwt but the documentation is pretty unclear to me. I tried find_peaks_cwt(my_data, np.arange(1,30)) but it returned a huge number of peaks. Then I tried adding the noise_perc=60 argument but this didn't really fix the problem. I've also tried playing around with the other parameters but I don't really understand what a 'ridge line' is.

What should I be doing differently? Is widths=np.arange(1,30) setting my width requirement like I think it is? How do I specify a height requirement?

An alternative approach that worked better for me was to filter manually the signal first convoluting it with a Gaussian window and then search for the maxima (see [this answer](https://stackoverflow.com/a/25666951/12131616)). — Puco4, Jul 20 '21 at 16:14

score 4 · Accepted Answer · edited Sep 06 '18 at 16:20

A lot depends on what your data actually mean (or what you think they ought to mean). Here's an example with synthetic data:

from scipy.signal import find_peaks_cwt
from matplotlib.pyplot import plot, ylim
from numpy import *
N = 2000
x = arange(N)
pwid = 200.
zideal = sinc(x/pwid - 2)**2 # Vaguely similar to yours
z = zideal * random.randn(N)**2 # adding noise
plot(x, zideal, lw=4)
ylim(0, 1)
zf = find_peaks_cwt(z, pwid/4+zeros(N))
plot(x[zf], zideal[zf], '*', ms=20, color='green')
# Create averaging zones around peaks
xlow = maximum(array(zf) - pwid/2, 0)
xhigh = minimum(array(zf) + pwid/2, x.max())
zguess = 0*xlow # allocate space
for ii in range(len(zf)):
   zguess[ii] = z[xlow[ii]:xhigh[ii]].mean()
plot(x[zf], zguess, 'o', ms=10, color='red')

pwidth scales the width of the peaks in the sinc() function. In the call to find_peaks_cwt(), using larger values for widths produces fewer peaks (lower density of peaks). The best result seems to be scaling values in widths to approximately half-width at half-max (HWHM) of the peaks.

find_peaks_cwt() does a pretty respectable job of finding the peaks from the ideal data. Summing around the values is a way to guess at the peak values. If you're summing spectral power, you should probably sum all the values in between rather than a fixed interval like I did for this quick-and-dirty demo.

I find the function especially impressive in the way it finds smaller peaks in the presence of much larger ones.

It seems like the primary feature differentiating your peak search is `pwid/4+zeros(N)`. Can you explain how you selected that? I don't understand either how you picked the magnitude or that it should be an array the same length as the number of points. — ericksonla, Feb 16 '17 at 22:35
From the documentation, widths is: "1-D array of widths to use for calculating the CWT matrix. In general, this range should cover the expected width of peaks of interest." In this case, I scaled the interval in sinc() to pwid. The "widths" array is one-sided width, so I scaled it to approximate the HWHM of the peaks. — Frank M, Feb 17 '17 at 05:28

Finding peaks in noisy data with find_peaks_cwt

1 Answers1