Take this df
np.random.seed(1)
n = 30000
df = pd.DataFrame({'temp1': np.random.normal(loc=1, scale=0.02, size=n)},
index=pd.date_range('2019-01-02 09:00', periods=n, freq='min'))
overtemp
is fairly simple with a rolling window. Pandas takes care of any skipped periods of time by aligning indexes.
df['overtemp'] = df.temp1.rolling('14D').max() \
.shift(-13, freq='D') / df.temp1 > 1.02
Output
temp1 overtemp
2019-01-02 09:00:00 1.032487 True
2019-01-02 09:01:00 0.987765 True
2019-01-02 09:02:00 0.989437 True
2019-01-02 09:03:00 0.978541 True
2019-01-02 09:04:00 1.017308 True
... ... ...
2019-01-23 04:55:00 1.013796 False
2019-01-23 04:56:00 1.011056 False
2019-01-23 04:57:00 0.998323 False
2019-01-23 04:58:00 0.974329 False
2019-01-23 04:59:00 1.006846 False
time_to_overtemp
is more computationally intensive and there is no built-in functionality in pandas for this case.
There are numpy tricks to find idxmax in a rolling window, as suggested by this blog and this question. But you are not asking for time_to_peak
within the window, you want the time to the first row over 2% of current.
I don't know of any other way than looping for that part. But if it helps, the following is the code to get time_to_peak
.
We basically want to do the same rolling window but instead of max
we look for idxmax
. If you did this by looping you would have to, for each row, slice a forward-looking window to find idxmax and get idx diff. A solution without looping would create a (potentially huge) df by pivoting the windows and having 2 weeks worth of columns for each row.
First, make sure the index is not missing any rows.
df = df.reindex(pd.date_range(df.index.min(), df.index.max(), freq='min'))
Then
def rolling_window(a, window):
'''
from https://rigtorp.se/2011/01/01/rolling-statistics-numpy.html
Defines a forward-looking rolling window
drops tail elements that do not have a full window
...added writeable=False as advised by numpy docs
'''
shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
# creates a new shape with the last axis reduced and a new axis of size <window>
strides = a.strides + (a.strides[-1],)
# adds the corresponding strides to match the new shape
return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides, writeable=False)
temp1 = df.temp1.values
window = 60 * 24 * 14 # 2 weeks in minute increments
df['time_to_window_peak'] = np.hstack((
np.argmax(rolling_window(temp1, window), axis=-1),
np.repeat(np.nan, window - 1)
))
print(df)
df['time_to_window_peak'] = pd.to_timedelta(df.time_to_overtemp, unit='min')
Output
temp1 overtemp time_to_window_peak
2019-01-02 09:00:00 1.032487 True 10 days 08:46:00
2019-01-02 09:01:00 0.987765 True 10 days 08:45:00
2019-01-02 09:02:00 0.989437 True 10 days 08:44:00
2019-01-02 09:03:00 0.978541 True 10 days 08:43:00
2019-01-02 09:04:00 1.017308 True 10 days 08:42:00
... ... ... ...
2019-01-23 04:55:00 1.013796 False NaT
2019-01-23 04:56:00 1.011056 False NaT
2019-01-23 04:57:00 0.998323 False NaT
2019-01-23 04:58:00 0.974329 False NaT
2019-01-23 04:59:00 1.006846 False NaT