I have two years worth of sensor data in a pandas dataframe. The index is a time series. Looks like this:
temp1 overtemp time_to_overtemp
datetime
2019-01-02 09:31:00 305.96
2019-01-02 09:32:00 305.98
2019-01-02 09:33:00 305.70
2019-01-02 09:34:00 305.30
2019-01-02 09:35:00 306.88
What I want to do is loop over the time series to populate the "overtemp" and "time_to_overtemp columns. "overtemp" needs to be assigned a 1 if the temperature data anytime in the next two weeks increases by more than 2%. "time_to_overtemp" needs to show the time of the next >2% reading, if it exists in the next two weeks. If the temperature says within 2% for the next two weeks, both columns should be assigned a 0.
For example 2019-01-02 09:31:00 should look at the next two weeks worth of temperature data and put a 0 in both columns because all data in that time period is within 2% of the value. The overtemp value for 2020-01-02 09:35:00 should be a 1 because the value increased by 5% a week later. The time_to_overtemp value should indicate 7 days, 2 hours, 38 minutes because thats when the overtemp occured.
I am successfully doing some more math stuff using iterrows:
for datetime, row in df.iterrows():
but its taking forever. And I am not getting how to do the time iterations and calculations at all yet.
I have done other labeling with:
df['overtemp'] = np.select([df['temp1']<305, df['temp1']>305], [1,0])
I guess this vectorizes the process? It sure works a lot faster than iterating. But I can't figure out how to implement the datetime+two week portion.