-1

I would like to detect a time series regime change (or anomaly). By regime change, I mean that the linear trend is changed / broken (see plot below).

import numpy as np
import matplotlib.pyplot as plt

x = range(50)
y = [1.28, 1.28, 1.26, 1.32, 1.34, 1.33, 1.38, 1.39, 1.37, 1.42, 
1.42, 1.41, 1.39, 1.41, 1.45, 1.45, 1.46, 1.5, 1.49, 1.53, 1.53, 
1.54, 1.61, 1.59, 1.62, 1.66, 1.63, 1.66, 1.66, 1.7, 1.76, 1.84, 
1.88, 1.97, 1.94, 1.98, 2.01, 2.02, 0.73, 0.72, 0.76, 0.87, 0.97, 
1.01, 0.98, 1.16, 1.22, 1.3, 1.27, 1.33]

plt.scatter(x, y)
plt.show()

enter image description here I have been searching for a while but cannot find a way to detect the big change in this time series.

Detecting a diff is not enough for me because I need to be able to detect that the rough linear trend is changed. The data can have a diff from one observation to the next that is large but the trend (linear trend) is still correct.

To explain why I abandoned the diff method:

The observation in x-axis around 45-46 shows a jump in the value but is actually in the linear trend, therefore not a "regime change" for me. This is exactly why I abandoned the diff method and I am looking for a "trend" method. I have been thinking of looping on observations, fit a linear regression and predict the next point, calculate an error, etc. But I would rather use a library made for this if it exists.

Timothée HENRY
  • 14,294
  • 21
  • 96
  • 136
  • Define "regime change". And what do you want to do with the result of this "detection"? – Scott Hunter Jun 26 '19 at 12:23
  • Did you have a look at [this](https://stackoverflow.com/questions/30366142/finding-very-large-jumps-in-data) and [this](https://stackoverflow.com/questions/22583391/peak-signal-detection-in-realtime-timeseries-data) and [this](https://stackoverflow.com/questions/26632205/finding-a-spike-or-drop-in-a-dataset-programatically) – Sheldore Jun 26 '19 at 12:23
  • Look at the first answer in the marked duplicate using `np.diff()` – Sheldore Jun 26 '19 at 12:26
  • @Sheldore I had a look at similar solutions as the diff but that does not answer my problem. I have added some comment. – Timothée HENRY Jun 26 '19 at 12:33
  • 1
    @tucson Then please show an example where plain difference (absolute value thereof) does not suffice. – dedObed Jun 26 '19 at 12:38
  • 1
    @dedObed Actually the observation in x-axis around 45-46 shows a jump in the value, which would trigger a rather large difference, but is actually in the linear trend, therefore not a "regime change" for me. This is exactly why I abandoned the diff method and I am looking for a "trend" method. – Timothée HENRY Jun 26 '19 at 13:00
  • @tucson See my answer. If doesn't fit your need, I'll need the tough example, this one is way too easy ;-) – dedObed Jun 26 '19 at 13:35

1 Answers1

3

Let me plot differences (orange) and second order (green) of your data:

Data, differences and second-order differences

As far as I can see, both of these seem rather discriminative towards detection of the jump, in this case a simple thresholding would do as a classifier.

The second order difference should be especially indicative of jump given how you describe the task: for a linear change, it 's bound to be (around) zero for non-jumpy sections.

Full code to reproduce the plot:

import numpy as np                                                                                                                                                                                                                           
import matplotlib.pyplot as plt                                                                                                                                                                                                              


x = range(50)                                                                                                                                                                                                                                
y = [1.28, 1.28, 1.26, 1.32, 1.34, 1.33, 1.38, 1.39, 1.37, 1.42,                                                                                                                                                                             
1.42, 1.41, 1.39, 1.41, 1.45, 1.45, 1.46, 1.5, 1.49, 1.53, 1.53,                                                                                                                                                                             
1.54, 1.61, 1.59, 1.62, 1.66, 1.63, 1.66, 1.66, 1.7, 1.76, 1.84,                                                                                                                                                                             
1.88, 1.97, 1.94, 1.98, 2.01, 2.02, 0.73, 0.72, 0.76, 0.87, 0.97,                                                                                                                                                                            
1.01, 0.98, 1.16, 1.22, 1.3, 1.27, 1.33]                                                                                                                                                                                                     


def get_deltas(series):                                                                                                                                                                                                                      
    return [series[i+1] - series[i] for i in range(len(series)-1)]                                                                                                                                                                           


y_delta = get_deltas(y)                                                                                                                                                                                                                      
y_delta_delta = get_deltas(y_delta)                                                                                                                                                                                                          

plt.scatter(x, y)                                                                                                                                                                                                                            
plt.scatter(x[:-1], y_delta)                                                                                                                                                                                                                 
plt.scatter(x[:-2], y_delta_delta)                                                                                                                                                                                                           
plt.show()                                                                                                                                                                                                                                   
dedObed
  • 1,313
  • 1
  • 11
  • 19