4

I am looking for an efficient way to detect plateaus in otherwise very noisy data. The plateaus are always relatively broad A simple example of what this data could look like:

test=np.random.uniform(0.9,1,100)
test[10:20]=0
plt.plot(test)

enter image description here

Note that there can be multiple plateaus (which should all be detected) which can have different values.

I've tried using scipy.signal.argrelextrema, but it doesn't seem to be doing what I want it to:

peaks=argrelextrema(test,np.less,order=25)
plt.vlines(peaks,ymin=0, ymax=1)

enter image description here

I don't need the exact interval of the plateau- a rough range estimate would be enough, as long as that estimate is bigger or equal than the actual plateau range. It should be relatively efficient however.

emilaz
  • 1,722
  • 1
  • 15
  • 31
  • Is there ever more than one plateau? Are the plateaus always zeroed values? If they're zero-values, do you need more than just getting the indices of zero-values, for example by `np.where(test==0)`? – A Kruger Nov 27 '18 at 04:32
  • Hey, thanks for the good questions. Yes there can be more than one plateau and no, the plateaus can have different values. Will add that to the question. – emilaz Nov 27 '18 at 04:44
  • 1
    Is there, as in [this question](https://stackoverflow.com/questions/15112964/digitizing-an-analog-signal), an intermediate value (such as y=0.5) that reliably separates the low groups of the values from the high groups of values? – Warren Weckesser Nov 27 '18 at 05:19

3 Answers3

3

There is a method scipy.signal.find_peaks that you can try, here is an exmple

import numpy
from scipy.signal import find_peaks

test = numpy.random.uniform(0.9, 1.0, 100)
test[10 : 20] = 0
peaks, peak_plateaus = find_peaks(- test, plateau_size = 1)

although find_peaks only finds peaks, it can be used to find valleys if the array is negated, then you do the following

for i in range(len(peak_plateaus['plateau_sizes'])):
    if peak_plateaus['plateau_sizes'][i] > 1:
        print('a plateau of size %d is found' % peak_plateaus['plateau_sizes'][i])
        print('its left index is %d and right index is %d' % (peak_plateaus['left_edges'][i], peak_plateaus['right_edges'][i]))

it will print

a plateau of size 10 is found
its left index is 10 and right index is 19
zyy
  • 1,271
  • 15
  • 25
2

This is really just a "dumb" machine learning task. You'll want to code a custom function to screen for them. You have two key characteristics to a plateau:

  1. They're consecutive occurrences of the same value (or very nearly so).
  2. The first and last points deviate strongly from a forward and backward moving average, respectively. (Try quantifying this based on the standard deviation if you expect additive noise, for geometric noise you'll have to take the magnitude of your signal into account too.)

A simple loop should then be sufficient to calculate a forward moving average, stdev of points in that forward moving average, reverse moving average, and stdev of points in that reverse moving average.

  • Read until you find a point well outside the regular noise (compare to variance). Start buffering those indices into a list.
  • Keep reading and buffering indices into that list while they have the same value (or nearly the same, if your plateaus can be a little rough; you'll want to use some tolerance plus the standard deviation of your plateaus, or just some tolerance if you expect them all to behave similarly).
  • If the variance of the points in your buffer gets too high, it's not a plateau, too rough; throw it out and start scanning again from your current position.
  • If the last value was very different from the previous (on the order of the change that triggered your code to start buffering indices) and in the opposite direction of the original impulse, cap your buffer here; you've got a plateau there.
  • Now do whatever you want with the points at those indices. Delete them, replace them with a linear interpolation between the two boundary points, whatever.

I could generate some noise and give you some sample code, but this is really something you're going to have to adapt to your application. (For example, there's a shortcoming in this method that a plateau which captures a point on the middle of the "cliff edge" may leave that point when it removes the rest of the plateau. If that's something you're worried about, you'll have to do a little more exploring after you ID the plateau.) You should be able to do this in a single pass over the data, but it might be wise to get some statistics on the whole set first to intelligently tweak your thresholds.

If you have an exact definition of what constitutes a plateau, you can make this a lot less hand-wavey and ML-looking, but so long as you're trying to identify fuzzy pattern, you're gonna have to take a statistics-based approach.

geofurb
  • 489
  • 4
  • 13
0

I had a similar problem, and found a simple heuristic solution shared below. I find plateaus as ranges of constant gradient of the signal. You could change the code to also check that the gradient is (close to) 0.

I apply a moving average (uniform_filter_1d) to filter out noise. Also, I calculate the first and second derivative of the signal numerically, so I'm not sure it matches the requirement of efficiency. But it worked perfectly for my signal and might be a good starting point for others.

def find_plateaus(F, min_length=200, tolerance = 0.75, smoothing=25):
    '''
    Finds plateaus of signal using second derivative of F.

    Parameters
    ----------
    F : Signal.
    min_length: Minimum length of plateau.
    tolerance: Number between 0 and 1 indicating how tolerant
        the requirement of constant slope of the plateau is.
    smoothing: Size of uniform filter 1D applied to F and its derivatives.
    
    Returns
    -------
    plateaus: array of plateau left and right edges pairs
    dF: (smoothed) derivative of F
    d2F: (smoothed) Second Derivative of F
    '''
    import numpy as np
    from scipy.ndimage.filters import uniform_filter1d
    
    # calculate smooth gradients
    smoothF = uniform_filter1d(F, size = smoothing)
    dF = uniform_filter1d(np.gradient(smoothF),size = smoothing)
    d2F = uniform_filter1d(np.gradient(dF),size = smoothing)
    
    def zero_runs(x):
        '''
        Helper function for finding sequences of 0s in a signal
        https://stackoverflow.com/questions/24885092/finding-the-consecutive-zeros-in-a-numpy-array/24892274#24892274
        '''
        iszero = np.concatenate(([0], np.equal(x, 0).view(np.int8), [0]))
        absdiff = np.abs(np.diff(iszero))
        ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
        return ranges
    
    # Find ranges where second derivative is zero
    # Values under eps are assumed to be zero.
    eps = np.quantile(abs(d2F),tolerance) 
    smalld2F = (abs(d2F) <= eps)
    
    # Find repititions in the mask "smalld2F" (i.e. ranges where d2F is constantly zero)
    p = zero_runs(np.diff(smalld2F))
    
    # np.diff(p) gives the length of each range found.
    # only accept plateaus of min_length
    plateaus = p[(np.diff(p) > min_length).flatten()]
    
    return (plateaus, dF, d2F)