2

I just generated a plot that looks like below; I am trying to count how many full waves are there in this chart. As annotated, there should be 7 full waves. I tried to smooth the chart but end up nowhere. I also researched find_peaks in scipy but it doesn't look like apply to this situation as one wave could have multiple peaks.

enter image description here

EDIT:

This question comes from a problem in reality: figuring out the number of shelves in the picture (see below):

enter image description here

After thresholding, I get the data file below:

Original: file-1

Average on rows: file-2

The file-2 result in the plot shown at the beginning.

lovechillcool
  • 762
  • 2
  • 10
  • 32
  • How do you define "full waves"? Above and then below some threshold? – Stack Tracer Aug 04 '20 at 05:13
  • @StackTracer This is the tricky part. I tagged 7 full waves in the pictures. – lovechillcool Aug 04 '20 at 05:14
  • Seems that you can set values to 1, when below mean value, and 0 above mean, then use find_peaks – Stefan Aug 04 '20 at 05:17
  • @lovechillcool yes, it tends to be the majority of the problem when trying to analyze real-world data. – Stack Tracer Aug 04 '20 at 05:19
  • In this case it seems like a fixed threshold of 130-140 ought to work, but that also depends on the variability of your data and how accurate you need to be (e.g. 4 is 1 wave, but 6/7 count as 2 in your case). – Stack Tracer Aug 04 '20 at 05:21
  • @Stefan Is it a good idea if I could smooth the plot to simplify the problem? Maybe DFF or FFF? – lovechillcool Aug 04 '20 at 05:50
  • 1
    Smoothing is difficult with peaks like this, but maybe you can use Savitzky–Golay filter ( from scipy.signal import savgol_filter). It's easy to use. – Stefan Aug 04 '20 at 06:14
  • For the edited question: I still think it will make sense to threshold the data, i.e. convert to 0 or 1 depending on the value being above/below some value (some percentage of maximum or average). In some cases hysteresis is needed, where the return to zero is conditioned on a lower threshold than the one used for switching to 1. – Stefan Aug 07 '20 at 17:13

1 Answers1

2

I think this is a hard problem to answer in general, because there are datasets where a solution might work, and there are datasets where it does not. Your data is upside down, so the first step is flipping y to -y, so the minimums will be interpreted as maximums (and maybe take absolute value to avoid dealing with negative numbers.)

The first option is to use scipy.signal.find_peaks. Knowing your data here you can utilize some of the parameters: in my experience height, distance and prominence are the most useful. There is a nice explanation about the parameters of find_peaks. This will correctly identify the peaks in most cases, but requires time to appopriately set the arguments.

A similar solution with scipy.signal.find_peaks_cwt, here (in most cases) you will need to adjust the widths parameter, which is (from the docs):

1-D array of widths to use for calculating the CWT matrix. In general, this range should cover the expected width of peaks of interest.

But again, this requires some prior knowledge about your data.

Because you have periodic data, maybe you can make use of FFT to find the characteristic frequenies to adjust the parameters inside find_peaks and find_peaks_cwt. Since you haven't provided the dataset I have only synthetic data to deal with. Note that I return len(peaks) - 1 because usually on boundaries there is an extra period which is counted.

import numpy as np
from scipy.signal import find_peaks, find_peaks_cwt
import matplotlib.pyplot as plt

# some generic data
x = np.linspace(0, 1000, 10000)
y = 250 + 100 * np.sin(0.08 * x) - np.random.normal(30, 20, 10000)

def count_waves_1(x, y):
    peaks, props = find_peaks(y, prominence=120, height= np.max(y) / 10, distance=200)
    # here you can make use of props to filter the peaks by different properties,
    # for example extract only the n largest prominence peak:
    #
    # ind = np.argpartition(props["prominences"], -n_largest)[-n_largest:]
    # peaks = peaks[ind]

    plt.plot(x, y)
    plt.plot(x[peaks], y[peaks], 'ro')
    return len(peaks) - 1

first_solution = count_waves_1(x, y)


def count_waves_2(x, y):
    peaks = find_peaks_cwt(y, widths=np.arange(100, 200))
    plt.plot(x, y)
    plt.plot(x[peaks], y[peaks], 'ro')
    return len(peaks) - 1

second_solution = count_waves_2(x, y)

print(first_solution, second_solution)

Péter Leéh
  • 2,069
  • 2
  • 10
  • 23