3

I am trying to find the peaks in some very noisy data such as this:

Without understanding the terminology very well, I'm defining the peaks as narrow (width<30) and more than 100000 higher than the nearby area.

I'm trying to use scipy's find_peaks_cwt but the documentation is pretty unclear to me. I tried find_peaks_cwt(my_data, np.arange(1,30)) but it returned a huge number of peaks. Then I tried adding the noise_perc=60 argument but this didn't really fix the problem. I've also tried playing around with the other parameters but I don't really understand what a 'ridge line' is.

What should I be doing differently? Is widths=np.arange(1,30) setting my width requirement like I think it is? How do I specify a height requirement?

Chris Martin
  • 30,334
  • 10
  • 78
  • 137
ericksonla
  • 1,247
  • 3
  • 18
  • 34
  • An alternative approach that worked better for me was to filter manually the signal first convoluting it with a Gaussian window and then search for the maxima (see [this answer](https://stackoverflow.com/a/25666951/12131616)). – Puco4 Jul 20 '21 at 16:14

1 Answers1

4

A lot depends on what your data actually mean (or what you think they ought to mean). Here's an example with synthetic data:

from scipy.signal import find_peaks_cwt
from matplotlib.pyplot import plot, ylim
from numpy import *
N = 2000
x = arange(N)
pwid = 200.
zideal = sinc(x/pwid - 2)**2 # Vaguely similar to yours
z = zideal * random.randn(N)**2 # adding noise
plot(x, zideal, lw=4)
ylim(0, 1)
zf = find_peaks_cwt(z, pwid/4+zeros(N))
plot(x[zf], zideal[zf], '*', ms=20, color='green')
# Create averaging zones around peaks
xlow = maximum(array(zf) - pwid/2, 0)
xhigh = minimum(array(zf) + pwid/2, x.max())
zguess = 0*xlow # allocate space
for ii in range(len(zf)):
   zguess[ii] = z[xlow[ii]:xhigh[ii]].mean()
plot(x[zf], zguess, 'o', ms=10, color='red')

pwidth scales the width of the peaks in the sinc() function. In the call to find_peaks_cwt(), using larger values for widths produces fewer peaks (lower density of peaks). The best result seems to be scaling values in widths to approximately half-width at half-max (HWHM) of the peaks.

find_peaks_cwt() does a pretty respectable job of finding the peaks from the ideal data. Summing around the values is a way to guess at the peak values. If you're summing spectral power, you should probably sum all the values in between rather than a fixed interval like I did for this quick-and-dirty demo.

ploted as in example

I find the function especially impressive in the way it finds smaller peaks in the presence of much larger ones.

jacanterbury
  • 1,435
  • 1
  • 26
  • 36
Frank M
  • 1,550
  • 15
  • 15
  • 1
    It seems like the primary feature differentiating your peak search is `pwid/4+zeros(N)`. Can you explain how you selected that? I don't understand either how you picked the magnitude or that it should be an array the same length as the number of points. – ericksonla Feb 16 '17 at 22:35
  • From the documentation, widths is: "1-D array of widths to use for calculating the CWT matrix. In general, this range should cover the expected width of peaks of interest." In this case, I scaled the interval in sinc() to pwid. The "widths" array is one-sided width, so I scaled it to approximate the HWHM of the peaks. – Frank M Feb 17 '17 at 05:28