0

I have the following Time Series: enter image description here

What I want to do is to filter out the points (maybe giving them a nan value) that I circle in orange. Let's say that the main reason to filter them out is because the don't follow the general pattern of the rest of the time series and they have very different values. Is there any filter, like low pass filter or any other idea that can be applied to filter them out?

In this case instead everything would be normal, so I won't filter out any part of the time series:

enter image description here

Since I am interested only in the part of the Time Series that have the red points, so the one that show that specific pattern, I don't care if in the process of filtering out the parts circled in orange also the beginning and end of the time series will be filtered out.

The reason why I don't want to use a threshold is that the range of values is different for every time series.

Marco
  • 1,195
  • 3
  • 18
  • 30
  • Wouldn't there be an implicit threshold when you use any filter? – gaganso Feb 21 '19 at 17:08
  • Yes but the main problem is that everty Time Series have differents range of values, so I want to find something that can do that without manually selecting a threshold value for every time series. – Marco Feb 21 '19 at 17:19
  • Do you care about live filtering for data that streams in? or do you already have all the data and you want to do post filtering? – ohlr Feb 21 '19 at 17:28
  • I have some window of data like the one you see, so I just need to do post filtering – Marco Feb 21 '19 at 17:29
  • Is there any periodicity in the data you have? – ohlr Feb 21 '19 at 17:31
  • 1
    Yes that repeated pattern that you see have more or less the same duration every time – Marco Feb 21 '19 at 17:42

1 Answers1

1

So since you're data is periodic you could try to fit a combination of multiple sines to your data.

Like shown here any oscilating function can be approximated through a combination of sine functions.

So what you have to do basically is a Fourier analysis.


Besides:

What I can think of is to calculate a mean for a relatively long period. Than you can specify a interval around that mean. Everything that is outside of that intervall is specified as outlier.

You could also do a Kalman filter aproach. Under the assumption that your data is constant + some gaussian noise. It would than always adapt to the new level and remain constant for some time.

Tutorial on Kalman filter

ohlr
  • 1,839
  • 1
  • 13
  • 29
  • Yes the only problem I can see in your approach is that maybe the mean would be affected too much by the beginning and end of the Time Series that have low values. – Marco Feb 21 '19 at 17:33
  • For example you could accregate data over time and create a histogram. Than you would have a couple of peaks in your histogram with a lot of data. and some points in between with very little data. – ohlr Feb 21 '19 at 17:43