I like this question. A typical moving average/k-nearest neighbourhood estimation. A nonnparametric approach. The following should work.
foo <- function(x, thresh = 300, h = 5, window.fun = mean) {
spikes.loc <- which(x > thresh)
low.bound <- spikes - h
up.bound <- spikes + h
N <- length(spikes.loc)
x.hat <- x
for (i in 1:N) x.hat[spikes.loc[i]] <- window.fun(x[low.bound[i]:up.bound[i]])
return(x.hat)
}
This function takes your original observations vector x
, threshold, a window size (a smoothing parameter) as well as a user-specified window function. The returned value is the vector smoothed data. It only differs from original data at spikes points. Common choice of a window function is density function, so you end up with weighted average of all neighbouring data.
Please note, I am assuming your data are evenly spaced, so a simple index x[i-h] : x[i+h] gives a reasonable neighbourhood. In more general setting, a window is based on euclidean distance, but will naively costs O(N*N), where N is the number of observations, which is expensive.
In R, there are built-in nonparametric estimation / smoothing tools. The most basic one is kernel smoothing
, a generalization of moving average. It uses FFT algorithm for fast computation at O(N log(N)) costs. Please see ?ksmooth
. More advanced are KernSmooth
and sm
packages.