0

I have a vector and for each point in that vector I would like to compute the difference between the average for some range of the points immediately before that point minus the average for some range of the points immediately after that point. I have done this with a for loop because filter does not seem to have an option to apply exclusively to points after a vector point (parameter sides = only 1 or 2) and because I did not know how to shoehorn this into an apply statement since I need a function that operates on each point using its position within the vector and not just its own value. Can someone show me the way?

Here's how I did it with a for loop:

x = rep(c(1,1,1,1,1,10), 20)
x = x + 100
x = x - c(1:length(x))
lookahead = 4
y = x

for(i in (lookahead):(length(x)-lookahead))
{
y[i] = mean(x[(i-lookahead):i]) - mean(x[i:(i+lookahead)])
}
plot(x)
lines(y, col="red")

You can see from the plot what the objective is: to identify spikes (but no I don't want to be told about other ways to find spikes, I want to use my simple boxcar moving average method).

There must be a better way to calculate this vector. Thank you for any suggestions.

p.s. I see someone wants to flag this as a repeat of Calculating moving average in R However my question is different as the answers to that question (use roll_mean or filter) don't apply here without modification. If there is a way to use roll_mean or filter, I can't tell from the docs and would appreciate someone telling me how I can use either of these to calculate forward-looking moving averages instead of backward-looking moving averages. Thanks again.

Community
  • 1
  • 1
sunny
  • 3,853
  • 5
  • 32
  • 62
  • 1
    can you add an example using your code with expected output – rawr Mar 24 '15 at 19:35
  • And some data... plenty of timeseries in `?datasets` – Gregor Thomas Mar 24 '15 at 19:41
  • @Gregor I just generated some very simple data/code to use. – sunny Mar 24 '15 at 19:49
  • @rawr complete example can be run from the lines I've pasted into my original post. Thanks for any suggestions. – sunny Mar 24 '15 at 19:49
  • i'm getting odd results with your example. are you sure that's what you want? – rawr Mar 24 '15 at 20:09
  • @rawr Yes I am sure that is what I want. When I use it on data in my lab, this makes sense. I generated synthetic data (x) just to make it easier. You can see the spikes in the red for y, showing how the boxcar difference of moving averages does identify spikes. – sunny Mar 24 '15 at 20:23
  • 2
    I don't think you are getting the average of 4 values: `i:i+lookahead` is a single value. I think you need to use `i:(i+lookahead)` Also `(i-lookahead):i` ( The `:` operator has precedence over `+` and `-`.) – Andre Michaud Mar 24 '15 at 20:27
  • @AndreMichaud thanks for the correction. I have changed the code in the question, but my original question (seeking a vectorized way to do this) remains. – sunny Mar 24 '15 at 21:07

2 Answers2

1

Problem with your procedure is that it starts at i=4, and subsets x[0:4] where R trims out 0 index automatically.

y1 = RcppRoll::roll_mean(x, 5)
y1 = c(rep(NA, 4), y1) - c(y1, rep(NA, 4)) # you can use y1 = lag(y1, 4) - y1 instead if you have dplyr
# fill NA positions
y1[1:4]=x[1:4] 
y1[116:120]=x[116:120]

y1 differs from y only at positions 4, and 116 where your loop is problematic.

In case, if you have no access to RcppRoll, you can use embed(faster than zoo::rollmean).

y1 = rowMeans(embed(x, 5)) #slightly slower than roll_mean
y1 = c(rep(NA, 4), y1) - c(y1, rep(NA, 4)) # you can use y1 = lag(y1, 4) - y1 instead if you have dplyr
# fill NA positions
y1[1:4]=x[1:4] 
y1[116:120]=x[116:120]
Khashaa
  • 7,293
  • 2
  • 21
  • 37
0

OK. I have one solution however, I've modified your code for the loop to go from (lookahead+1):(length(x)-lookahead) . This is so that the very first mean is a mean of 5 values like all the rest.

Calculate a vector of averages of 5 values:

lastIndexInY <- length(x)-lookahead 
Y_ave <- (x[ 1:lastIndexInY ] + x[ 1:lastIndexInY +1] + x[ 1:lastIndexInY +2] + x[ 1:lastIndexInY +3]+ x[ 1:lastIndexInY +4] )/5

Then your result y is the same as:

y_vec  <- c(x[1:4], Y_ave[1:(length(Y_ave)-4)] - Y_ave[5:length(Y_ave) ] ,  x[-3:0 + length(x)]  )

all(y - y_vec == 0 )
[1] TRUE

(Are you sure you need to retain the first 4 and last 4 values of x?)