How to find highest peaks in time series data between points of 0 in python?

Question

I am trying to take my time series data and isolate all data between points of 0, and then identify those intervals with the highest peaks. I am working in python.

Referring to this graph: time series data with peaks and valleys identified

Source: https://tcoil.info/find-peaks-and-valleys-in-dataset-with-python/

Noting that the first and last red valley points are at 0, I want to find a way to take time series data, identify all points at 0 on the y-axis, and then isolate the data in between. For the graph I linked to here, I would want to isolate all data between the first and last red valley point. I want to do this across an entire time series data set, where data between points of 0 on the y-axis are isolated. Now that those intervals are isolated (representing different events/cycles throughout the data), I want to record the highest point within each of these intervals. Then I want to find the intervals with the 5 highest peaks (one peak per each interval). Lastly, I want to output the interval (or range) that contains these top 5 peaks. For context, each of these intervals represents an event/cycle, and I want to find the most extreme. As such, I would want an output that tells me essentially that the most extreme event/cycle occurred between 3/5/20 and 3/24/20.

How can this be done in python? Would I need to smooth the data first? How would I go about isolating data between points of 0 on the y-axis? I am trying to figure out which direction to go in first, and do not have code yet.

Related: https://stackoverflow.com/questions/1713335/peak-finding-algorithm-for-python-scipy and https://stackoverflow.com/questions/62537703/how-to-find-inflection-point-in-python — dmn, Jun 17 '21 at 18:26

enzo · Accepted Answer · 2021-05-19T02:32:01.633

Let's use the data cited by you. I'll add detailed explanations as comments.

x = np.linspace(-1, 3, 1000)
y = -0.1 * np.cos(12*x) + np.exp(-(1-x)**2)

I want to find a way to take time series data, identify all points at 0 on the y-axis, and then isolate the data in between

So basically do you want to separate consecutive points above y-axis and below y-axis. Based on this answer, you can do something like this:

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')    

# Find all consecutive chunks that are above y=0
for start, stop in contiguous_regions(y > 0):
    ax.plot(x[start:stop], y[start:stop], color='red')

# Find all consecutive chunks that are below y=0
for start, stop in contiguous_regions(y < 0):
    ax.plot(x[start:stop], y[start:stop], color='blue')

ax.axhline(0, color='grey')
plt.show()
plt.close()

Blue points are below y-axis and red points are above, as you can see. They are sure isolated.

For the graph I linked to here, I would want to isolate all data between the first and last red valley point.

You can also do that. For each chunk, we'll need to find its valleys and you can do it by looking at the link you sent us!

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')    
for start, stop in contiguous_regions(y > 0):
    x_chunk, y_chunk = x[start:stop], y[start:stop] 
    ax.plot(x_chunk, y_chunk, color='red')
    
    # Find all the valleys
    valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
    
    # If there's more than two valleys (the first and the last)
    if valleys.size > 2:
        # Get'em!
        iv0, *_, iv1 = valleys
        
        # Plot'em!
        ax.plot(x_chunk[iv0:iv1], y_chunk[iv0:iv1], color='red', linewidth=4)
    
for start, stop in contiguous_regions(y < 0):
    x_chunk, y_chunk = x[start:stop], y[start:stop] 
    ax.plot(x_chunk, y_chunk, color='blue')
    
    # The same.
    valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
    if valleys.size > 2:
        iv0, *_, iv1 = valleys
        ax.plot(x_chunk[iv0:iv1], y_chunk[iv0:iv1], color='blue', linewidth=4)
        
ax.axhline(0, color='grey')
plt.show()
plt.close()

See that big red line? It's ours, by right.

We're starting to repeating ourselves here. Let's make a function:

def do_chunk(x_chunk, y_chunk, color):
    ax.plot(x_chunk, y_chunk, color=color)
    
    valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
    if valleys.size > 2:
        iv0, *_, iv1 = valleys
        ax.plot(x_chunk[iv0:iv1], y_chunk[iv0:iv1], color=color, linewidth=4)

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')    
for start, stop in contiguous_regions(y > 0):
    do_chunk(x[start:stop], y[start:stop], 'red')
    
for start, stop in contiguous_regions(y < 0):
    do_chunk(x[start:stop], y[start:stop], 'blue')
    
ax.axhline(0, color='grey')
plt.show()
plt.close()

That's better and even better: with the same plot. What's next?

Now that those intervals are isolated (representing different events/cycles throughout the data), I want to record the highest point within each of these intervals.

But that's too easy. Let's make it like Jude, who probably made it.

def do_chunk(x_chunk, y_chunk, color):
    ax.plot(x_chunk, y_chunk, color=color)
    
    valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
    if valleys.size > 2:
        iv0, *_, iv1 = valleys
        x_trim, y_trim = x_chunk[iv0:iv1], y_chunk[iv0:iv1]
        ax.plot(x_trim, y_trim, color=color, linewidth=4)
        
        # Get the index of the maximum value in this trim
        ip = np.argmax(y_trim)
        ax.scatter(x_trim[ip], y_trim[ip], color='blue')

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')    
for start, stop in contiguous_regions(y > 0):
    do_chunk(x[start:stop], y[start:stop], 'red')
    
for start, stop in contiguous_regions(y < 0):
    do_chunk(x[start:stop], y[start:stop], 'blue')
    
ax.axhline(0, color='grey')
plt.show()
plt.close()

Are you seeing that little dot over there? It's ours too. The maximum peak.

Then I want to find the intervals with the 5 highest peaks (one peak per each interval)

Ok, that's a harder one. Let's create some lists so we can store them!

def do_chunk(x_chunk, y_chunk, color):
    ax.plot(x_chunk, y_chunk, color=color)
    
    valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
    if valleys.size > 2:
        iv0, *_, iv1 = valleys
        x_trim, y_trim = x_chunk[iv0:iv1], y_chunk[iv0:iv1]
        ax.plot(x_trim, y_trim, color=color, linewidth=4)
        
        ip = np.argmax(y_trim)
        ax.scatter(x_trim[ip], y_trim[ip], color='blue')
        
        # Return the x, y of the peak
        return x_trim[ip], y_trim[ip]
    
    return None

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')    

intervals = []

for start, stop in contiguous_regions(y > 0):
    # Receive it here
    peak = do_chunk(x[start:stop], y[start:stop], 'red')
    
    # If this data contains at least two valleys
    if peak is not None:
        # Let's use a Javascript favorite to store data: JSONs
        intervals.append({
            'start': start,
            'stop': stop,
            'peak': peak,
        })
    
for start, stop in contiguous_regions(y < 0):
    peak = do_chunk(x[start:stop], y[start:stop], 'blue')
    if peak is not None:
        intervals.append({
            'start': start,
            'stop': stop,
            'peak': peak,
        })
    
ax.axhline(0, color='grey')
plt.show()
plt.close()

So, what's inside intervals? Let's check it out! Oh, I've checked here. It's giving

[{'start': 121, 'stop': 892, 'peak': (0.8098098098098099, 1.0602140027371494)}]

What does this even mean? This means that, from index 121 to index 892, the highest peak found is at x=0.809 and y=1.060. Great, huh? Since the data being used only contains one peak, that's him.

To find the highest y peaks, just do a list comprehension:

# High five!
high_five = sorted(  # Sort it, so the highest peaks will be on the list tail
    [(interval["start"], interval["stop"]) for interval in intervals],
    key=lambda interval: interval["peak"][1], # Filter by the y-value of its peak
)[:-5]  # Get the last five

Lastly, I want to output the interval (or range) that contains these top 5 peaks.

It's easy now, but I'll leave this to you. Trust me, the worst part is already done.

How to find highest peaks in time series data between points of 0 in python?

1 Answers1