Let's use the data cited by you. I'll add detailed explanations as comments.
x = np.linspace(-1, 3, 1000)
y = -0.1 * np.cos(12*x) + np.exp(-(1-x)**2)
I want to find a way to take time series data, identify all points at 0 on the y-axis, and then isolate the data in between
So basically do you want to separate consecutive points above y-axis and below y-axis. Based on this answer, you can do something like this:
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')
# Find all consecutive chunks that are above y=0
for start, stop in contiguous_regions(y > 0):
ax.plot(x[start:stop], y[start:stop], color='red')
# Find all consecutive chunks that are below y=0
for start, stop in contiguous_regions(y < 0):
ax.plot(x[start:stop], y[start:stop], color='blue')
ax.axhline(0, color='grey')
plt.show()
plt.close()
Blue points are below y-axis and red points are above, as you can see. They are sure isolated.

For the graph I linked to here, I would want to isolate all data between the first and last red valley point.
You can also do that. For each chunk, we'll need to find its valleys and you can do it by looking at the link you sent us!
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')
for start, stop in contiguous_regions(y > 0):
x_chunk, y_chunk = x[start:stop], y[start:stop]
ax.plot(x_chunk, y_chunk, color='red')
# Find all the valleys
valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
# If there's more than two valleys (the first and the last)
if valleys.size > 2:
# Get'em!
iv0, *_, iv1 = valleys
# Plot'em!
ax.plot(x_chunk[iv0:iv1], y_chunk[iv0:iv1], color='red', linewidth=4)
for start, stop in contiguous_regions(y < 0):
x_chunk, y_chunk = x[start:stop], y[start:stop]
ax.plot(x_chunk, y_chunk, color='blue')
# The same.
valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
if valleys.size > 2:
iv0, *_, iv1 = valleys
ax.plot(x_chunk[iv0:iv1], y_chunk[iv0:iv1], color='blue', linewidth=4)
ax.axhline(0, color='grey')
plt.show()
plt.close()
See that big red line? It's ours, by right.

We're starting to repeating ourselves here. Let's make a function:
def do_chunk(x_chunk, y_chunk, color):
ax.plot(x_chunk, y_chunk, color=color)
valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
if valleys.size > 2:
iv0, *_, iv1 = valleys
ax.plot(x_chunk[iv0:iv1], y_chunk[iv0:iv1], color=color, linewidth=4)
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')
for start, stop in contiguous_regions(y > 0):
do_chunk(x[start:stop], y[start:stop], 'red')
for start, stop in contiguous_regions(y < 0):
do_chunk(x[start:stop], y[start:stop], 'blue')
ax.axhline(0, color='grey')
plt.show()
plt.close()
That's better and even better: with the same plot. What's next?
Now that those intervals are isolated (representing different events/cycles throughout the data), I want to record the highest point within each of these intervals.
But that's too easy. Let's make it like Jude, who probably made it.
def do_chunk(x_chunk, y_chunk, color):
ax.plot(x_chunk, y_chunk, color=color)
valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
if valleys.size > 2:
iv0, *_, iv1 = valleys
x_trim, y_trim = x_chunk[iv0:iv1], y_chunk[iv0:iv1]
ax.plot(x_trim, y_trim, color=color, linewidth=4)
# Get the index of the maximum value in this trim
ip = np.argmax(y_trim)
ax.scatter(x_trim[ip], y_trim[ip], color='blue')
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')
for start, stop in contiguous_regions(y > 0):
do_chunk(x[start:stop], y[start:stop], 'red')
for start, stop in contiguous_regions(y < 0):
do_chunk(x[start:stop], y[start:stop], 'blue')
ax.axhline(0, color='grey')
plt.show()
plt.close()
Are you seeing that little dot over there? It's ours too. The maximum peak.

Then I want to find the intervals with the 5 highest peaks (one peak per each interval)
Ok, that's a harder one. Let's create some lists so we can store them!
def do_chunk(x_chunk, y_chunk, color):
ax.plot(x_chunk, y_chunk, color=color)
valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
if valleys.size > 2:
iv0, *_, iv1 = valleys
x_trim, y_trim = x_chunk[iv0:iv1], y_chunk[iv0:iv1]
ax.plot(x_trim, y_trim, color=color, linewidth=4)
ip = np.argmax(y_trim)
ax.scatter(x_trim[ip], y_trim[ip], color='blue')
# Return the x, y of the peak
return x_trim[ip], y_trim[ip]
return None
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')
intervals = []
for start, stop in contiguous_regions(y > 0):
# Receive it here
peak = do_chunk(x[start:stop], y[start:stop], 'red')
# If this data contains at least two valleys
if peak is not None:
# Let's use a Javascript favorite to store data: JSONs
intervals.append({
'start': start,
'stop': stop,
'peak': peak,
})
for start, stop in contiguous_regions(y < 0):
peak = do_chunk(x[start:stop], y[start:stop], 'blue')
if peak is not None:
intervals.append({
'start': start,
'stop': stop,
'peak': peak,
})
ax.axhline(0, color='grey')
plt.show()
plt.close()
So, what's inside intervals
? Let's check it out! Oh, I've checked here. It's giving
[{'start': 121, 'stop': 892, 'peak': (0.8098098098098099, 1.0602140027371494)}]
What does this even mean? This means that, from index 121
to index 892
, the highest peak found is at x=0.809 and y=1.060. Great, huh? Since the data being used only contains one peak, that's him.
To find the highest y peaks, just do a list comprehension:
# High five!
high_five = sorted( # Sort it, so the highest peaks will be on the list tail
[(interval["start"], interval["stop"]) for interval in intervals],
key=lambda interval: interval["peak"][1], # Filter by the y-value of its peak
)[:-5] # Get the last five
Lastly, I want to output the interval (or range) that contains these top 5 peaks.
It's easy now, but I'll leave this to you. Trust me, the worst part is already done.