-1

I have two years worth of sensor data in a pandas dataframe, interval is 1 minute. The index is a time series. Looks like this:


                     temp1     overtemp  time_to_overtemp
datetime                                                    
2019-01-02 09:31:00  305.96
2019-01-02 09:32:00  305.98
2019-01-02 09:33:00  305.70
2019-01-02 09:34:00  305.30
2019-01-02 09:35:00  306.88

My goal is to set overtemp = 1 if the temperature increases by 2% in the next 2 weeks and a 0 if it does not. I then want to populate the time_to_overtemp with the time-delay to the maximum of that two week temperature.

I had another thread recommend a solution that used .apply(), but it was slow. I wanted to implement a list comprehension to speed things up.

This is what I am working with right now:

df['overtemp'] = [1 if i*1.02 <= df['temp1'].rolling('14D').max() else 0 for i in df['temp1']]

I have gotten this to work with other logic, but when I add the <= logic I get a value error:

The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Also, I think the rolling window is looking backwards by default? How can I get it looking forwards?

Fr3nch
  • 9
  • 4

1 Answers1

1

Take this df

np.random.seed(1)
n = 30000
df = pd.DataFrame({'temp1': np.random.normal(loc=1, scale=0.02, size=n)}, 
    index=pd.date_range('2019-01-02 09:00', periods=n, freq='min'))

overtemp is fairly simple with a rolling window. Pandas takes care of any skipped periods of time by aligning indexes.

df['overtemp'] = df.temp1.rolling('14D').max() \
    .shift(-13, freq='D') / df.temp1 > 1.02

Output

                        temp1  overtemp
2019-01-02 09:00:00  1.032487      True
2019-01-02 09:01:00  0.987765      True
2019-01-02 09:02:00  0.989437      True
2019-01-02 09:03:00  0.978541      True
2019-01-02 09:04:00  1.017308      True
...                       ...       ...
2019-01-23 04:55:00  1.013796     False
2019-01-23 04:56:00  1.011056     False
2019-01-23 04:57:00  0.998323     False
2019-01-23 04:58:00  0.974329     False
2019-01-23 04:59:00  1.006846     False

time_to_overtemp is more computationally intensive and there is no built-in functionality in pandas for this case.

There are numpy tricks to find idxmax in a rolling window, as suggested by this blog and this question. But you are not asking for time_to_peak within the window, you want the time to the first row over 2% of current.

I don't know of any other way than looping for that part. But if it helps, the following is the code to get time_to_peak.


We basically want to do the same rolling window but instead of max we look for idxmax. If you did this by looping you would have to, for each row, slice a forward-looking window to find idxmax and get idx diff. A solution without looping would create a (potentially huge) df by pivoting the windows and having 2 weeks worth of columns for each row.

First, make sure the index is not missing any rows.

df = df.reindex(pd.date_range(df.index.min(), df.index.max(), freq='min'))

Then

def rolling_window(a, window):
    '''
    from https://rigtorp.se/2011/01/01/rolling-statistics-numpy.html
    
    Defines a forward-looking rolling window
    drops tail elements that do not have a full window
    ...added writeable=False as advised by numpy docs
    
    '''
    
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
        # creates a new shape with the last axis reduced and a new axis of size <window>
    strides = a.strides + (a.strides[-1],)
        # adds the corresponding strides to match the new shape
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides, writeable=False)

temp1 = df.temp1.values
window = 60 * 24 * 14 # 2 weeks in minute increments

df['time_to_window_peak'] = np.hstack((
    np.argmax(rolling_window(temp1, window), axis=-1),
    np.repeat(np.nan, window - 1)
    ))
print(df)
df['time_to_window_peak'] = pd.to_timedelta(df.time_to_overtemp, unit='min')

Output

                        temp1  overtemp time_to_window_peak
2019-01-02 09:00:00  1.032487      True    10 days 08:46:00
2019-01-02 09:01:00  0.987765      True    10 days 08:45:00
2019-01-02 09:02:00  0.989437      True    10 days 08:44:00
2019-01-02 09:03:00  0.978541      True    10 days 08:43:00
2019-01-02 09:04:00  1.017308      True    10 days 08:42:00
...                       ...       ...                 ...
2019-01-23 04:55:00  1.013796     False                 NaT
2019-01-23 04:56:00  1.011056     False                 NaT
2019-01-23 04:57:00  0.998323     False                 NaT
2019-01-23 04:58:00  0.974329     False                 NaT
2019-01-23 04:59:00  1.006846     False                 NaT
RichieV
  • 5,103
  • 2
  • 11
  • 24