Using fuglede's answer, it's easy to find the local extrema of a DataFrame column :
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Generate a noisy AR(1) sample
np.random.seed(0)
rs = np.random.randn(200)
xs = [0]
for r in rs:
xs.append(xs[-1]*0.9 + r)
df = pd.DataFrame(xs, columns=['data'])
# Find local peaks
df['min'] = df.data[(df.data.shift(1) > df.data) & (df.data.shift(-1) > df.data)]
df['max'] = df.data[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)]
# Plot results
plt.scatter(df.index, df['min'], c='r')
plt.scatter(df.index, df['max'], c='g')
df.data.plot()
Which gives the following graph :
I now would like to group those extrema in pairs (minimum & extremum that are neighbors, in this order) and remove the pairs where extremum < minimum + threshold
. By removing I mean replacing the corresponding values in df['min']
and df['max']
by nan
s.
This basically filters the irrelevant small extrema.
I've tried find_peaks
with various options but none gave the intended results.
Is there an elegant and fast way to do this ?