I plotted min points for df['Data']
.
Timestamp = pd.date_range('2020-02-06 08:23:04', periods=1000, freq='s')
df = pd.DataFrame({'Timestamp': Timestamp,
'Data': 30+15*np.cos(np.linspace(0,10,Timestamp.size))})
df['timediff'] = (df['Timestamp'].shift(-1) - df['Timestamp']).dt.total_seconds()
df['datadiff'] = df['Data'].shift(-1) - df['Data']
df['gradient'] = df['datadiff'] / df['timediff']
min_pt = np.min(df['Data'])
# filter_pt = df.loc(df['gradient'] >= -0.1) # & df.loc[i, 'gradient'] <=0.1
mask = np.array(df['Data']) == min_pt
color = np.where(mask, 'blue', 'yellow')
fig,ax = plt.subplots(figsize=(20,10))
# plt.plot_date(df['Timestamp'], df['Data'], '-' )
ax.scatter(df['Timestamp'], df['Data'], color=color, s=10)
plt.ticklabel_format
plt.show()
I want to extend the condition using df['gradient'] column:
- What if instead of marking only 'minimum' points, I want to mark the points where
gradient
lies between 0.1 and -0.1 inclusive? - Additional condition: Take only the first datapoint in such range(ie.0.1 and -0.1 inclusive).
- How to loop through whole dataset, rather than just taking the first data point that satisfies these conditions(what my current plot did)?
Tried to add:
df1 = df[df.gradient <= 0.1 & df.gradient >= -0.1]
plt.plot(df1.Timestamp,df1.Data, label="filter")
before mask
based on this answer which returned error:
TypeError: Cannot perform 'rand_' with a dtyped [float64] array and scalar of type [bool]
I think what I did wasn't very efficient. How to do it more efficiently?
Update:
With code
Timestamp = pd.date_range('2020-02-06 08:23:04', periods=1000, freq='s')
df = pd.DataFrame({'Timestamp': Timestamp,
'Data': 30+15*np.cos(np.linspace(0,10,Timestamp.size))})
df['timediff'] = (df['Timestamp'].shift(-1) - df['Timestamp']).dt.total_seconds()
df['datadiff'] = df['Data'].shift(-1) - df['Data']
df['gradient'] = df['datadiff'] / df['timediff']
fig,ax = plt.subplots(figsize=(20,10))
df1 = df[(df.gradient <= 0.1) & (df.gradient >= -0.1)]
plt.plot(df1.Timestamp,df1.Data, label="filter")
plt.show()
After changing the range to
df1 = df[(df.gradient <= 0.01) & (df.gradient >= -0.01)]
Why?