If you have a reference threshold value, you can just naively filter data using Numpy with something like data[data < threshold]
with threshold
set for example to 10_000
. Alternatively, you can also put some NaN values if you want to keep these values (because is may not always make sense to just remove them) using data[data < threshold] = np.nan
.
If you do not have a reference value, then things stat to be a bit more complex. They are fancy ways to detect efficiently such patterns but most are complex.
The simplest solution is analyse the standard deviation of your input data using a sliding window and detect outliers regarding the resulting local standard deviation. You can see how to do that here (you need to combine this with something like data[sdValues < threshold]
to remove the outliers). Note however that this method is very sensitive for values near 0.
An alternative solution is to compute a Gaussian/median filter and then measure the relative difference (or another more advanced distance metric) with your input data (a bit like in a high-pass filter). You can take a look to this post to do that.
For these two methods, you need to define an arbitrary threshold. But unlike the naive method, this threshold is directly related to the data variation and not the raw data itself. It is up to you to find a good threshold regarding the data variations, the outliers and the expected final input.
Note: you might be interested in using scipy.signal
(especially to compute filters).