0

I'm a bit of a r newbie, and have am a little stuck at the way forward to run a correlation on time-series data where the second vector is much longer and I want to run a rolling time window.

My data looks something like this :

set.seed(1)
# "Target sample"  (this is always of known fixed length N, e.g. 20 )
target <- data.frame(Date=rep(seq(Sys.Date(),by="1 day",length=20)),Measurement=rnorm(2))

# "Potential Sample" (this is always much longer and of unknown length,e.g. 730 in this example)
potential <- data.frame(Date=rep(seq(Sys.Date()-1095,by="1 day",length=730)),Measurement=rnorm(2)) 

What I would like to do is take a rolling window of size N (i.e matching the size of target sample), incrementing the roll by one day at a time, and then print two columns for each window :

WindowStartDate and the result of cor(target,potentialWindow)

So in pseudo-code (using the generated example above) :

  1. Start at Sys.Date()-1095, take window size N values
  2. Print (or,probably better, put in to new data frame) Sys.Date()-1095 and result of cor(target,potentialWindow)
  3. Roll forward +1 day to Sys.Date()-1094 , take window size N values
  4. Print (or, probably better, put in to new data frame) Sys.Date()-1094 and result of cor(target,potentialWindow)
  5. etc. etc.

N.B. The roll forward +1 day is obviously a variable that could be tweaked depending on desired overlap.

Little Code
  • 1,315
  • 2
  • 16
  • 37
  • Could you please give a reproducible example? http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Hack-R Jun 19 '16 at 13:46

1 Answers1

1

Here's a way we can do it. Note that in your original example you only specified rnorm(2), which worked because R can recycle arguments, but it's probably not what you wanted. We just need to initialize a few things, and then send it through a for loop.

It seems like we can just pull the date you want from the potential data set, but if you want to use the Sys.Date() - X formula, I've shown how to do that as well.

set.seed(1)
# "Target sample"  (this is always of known fixed length N, e.g. 20 )
target <- data.frame(Date = rep(seq(Sys.Date(), by = "1 day", length = 20)),
                     Measurement = rnorm(20))

# "Potential Sample" (this is always much longer and of unknown length,e.g. 730 in this example)
potential <- data.frame(Date = rep(seq(Sys.Date() - 1095, by = "1 day", length = 730)),
                        Measurement = rnorm(730)) 

#initialize values
N <- 20
len_potential <- nrow(potential) - (N - 1)
time_start <- 1096

result.df <- data.frame(Day = potential[1,1],
                        Corr = numeric(len_potential),
                        Day2 = potential[1,1],
                        stringsAsFactors = FALSE
                        )
#use a for loop
for(i in 1:len_potential){
  result.df[i,1] = as.Date(potential[i,1])
  result.df[i,2] = cor(target[,2], potential[i:(i+N-1), 2])
  result.df[i,3] = Sys.Date() - (time_start - i)
}

Also, as a note on posting questions to SO, sometimes it is helpful to provide desired output.

bouncyball
  • 10,631
  • 19
  • 31
  • Hi @bouncyball, thanks for this ! I will go test this and let you know. You are right about "just pull the date you want from the potential data set", that's my fault in the bad description, that is indeed what I meant, however your extra method may be useful at some point. – Little Code Jun 19 '16 at 15:57
  • Hi @bouncyball, after a quick test, looks like your code is doing what I'm after, except the window seems to be stopping one short (e.g. if you look at an output http://pastebin.com/ASaUUtkH, surely the last row in result.df would be expected to be 2013-06-30 ? in the 5 day window I used 2013-06-30 to 2013-07-04 inclusive would be the last expected ?). Maybe I'm mistaken in my expectations though ! – Little Code Jun 19 '16 at 17:10
  • @LittleCode sorry about that! I've edited my answer (I changed how `len_potential` was calculated), let me know if that works now – bouncyball Jun 19 '16 at 17:13