-1

I have a list of values, my_list, which shows the usage of a device in different times, like below:

my_list=[0.0, 11500312.5, 12293437.5, 11896875.0, 7711186.0,     
             3281768.863, 3341550.1363, 3300694.0,...]

I have many lists of this type and I want to find the numbers of the most significant changes (decreasing or increasing) in different times. One of these lists is plotted below. For example, if you look at the second, third and forth points in the graph you can see the difference between the values are not much, but the value suddenly decreased at fifth and Sith point. The same significant changes happened between point 20, 21 and 22.

enter image description here

So you can see in the plot they are two-three significant increasing and decreasing w.r.t to the other times. Any idea to find the numbers automatically?

Spedo
  • 355
  • 3
  • 13
  • 1
    This is a very broad question about anomaly detection. – James Apr 15 '19 at 13:13
  • What do you mean by significant, 1ms, 10m, 5h, 3d, 5y? – SanMu Apr 15 '19 at 13:14
  • 1
    You need to define what you consider significant: one easy way would be to look in the variation between two numbers. Let's consider 10% has a significant variation. Now, there are 2 ways to approach this, which depends on your data: 1. you could look at the variation between one value and the next one `L[k] & L[k+1]`. 2. you could look at the variation on subgroups of values (e.g. bins of 50 or 100 values). There are many other ways to perform this type of automatic detection, but you need to define what you want, and you need to characterize your datasets to choose the best algo. – Mathieu Apr 15 '19 at 13:18
  • I have edited the question to be more clear. – Spedo Apr 15 '19 at 13:24

1 Answers1

1

Here's an approach that might work for you. Check how the value compares to the moving average. Is it more than one standard deviation away?

Here's a moving average implementation using numpy:

import numpy as np
def running_mean(x, N):
    cumsum = numpy.cumsum(numpy.insert(x, 0, 0)) 
    return (cumsum[N:] - cumsum[:-N]) / float(N)

From here

Here's an implementation of the comparison operation:

TimeSEries=[0.0, 11500312.5, 12293437.5, 11896875.0, 7711186.0,     
             3281768.863, 3341550.1363, 3300694.0]

MOV = running_mean(TimeSEries,3).tolist()
STD = np.std(MOV)
events= []
ind = []
for ii in range(len(TimeSEries)):
  try:
    if TimeSEries[ii] > MOV[ii]+STD:
        print(TimeSEries[ii])
  except IndexError:
    pass

From here

Charles Landau
  • 4,187
  • 1
  • 8
  • 24
  • Yes, this method works, but looking to the plot, it can recognise only one significant change (forth point)! Considering --running_mean(TimeSEries,3).tolist() function -- what is "3"? and how can we define this value? – Spedo Apr 15 '19 at 15:40
  • 1
    3 is what I arbitrarily selected for the moving average window. You could adjust the sensitivity and pad the array to catch more anomalies – Charles Landau Apr 15 '19 at 18:24
  • Yes, I got, when I increase the value, it finds more points and vice versa. Tnx. – Spedo Apr 16 '19 at 07:41