I've been using gregmisc library to perform a rolling decile ranking.
Let's say I have vector 'X' of 1000 continuous value and I apply my function with a look back window of 250 (which is what I use).
My current function works as follows: The first 250 records will be values between 1 & 10. Then the next record 251, will be determined by the values from c(2:251), then repeats for c(3:252), etc...
While it does the trick faster than a loop, the performance of using gregmisc's "running" function for my decile ranking function has much to be desired.
I've been working on speeding up my functions by operating over the entire time series by creating basically columns of information that I would need at that time but I have not come up with a similar solution for this problem like I have for others. When I used this method, I've reduced processing time by as much as 95%.
Matrices may work more quickly but I haven't seen it done well enough to beat my running version.
Any ideas?
Thanks!
Here is the code I'm using: 1 core function then a function that uses rolling from Greg misc:
F_getDecileVal <- function( x, deciles=0.1) {
len<-length(x)
y <- array(0,dim=len)
deciles <- seq(0,1,deciles)
decileBounds <- quantile( x ,deciles, na.rm=TRUE)
lendecile <- length(decileBounds)
for( i in 2 : lendecile) {
y[ which( x <= decileBounds[[i]] & x >= decileBounds[[i-1]] ) ] <- (i - 1)
}
#Reverse Order so top decile has largest values
dec6 <- which(y==6); dec7 <- which(y==7); dec8 <- which(y==8); dec9 <- which(y==9); dec10 <-which(y==10);
dec1 <- which(y==1); dec2 <- which(y==2); dec3 <- which(y==3); dec4 <- which(y==4); dec5 <-which(y==5);
y[dec1]<-10; y[dec2]<-9; y[dec3]<-8; y[dec4]<-7; y[dec5]<-6; y[dec6]<-5; y[dec7]<-4; y[dec8]<-3; y[dec8]<-3; y[dec9]<-2; y[dec10]<-1;
return(y)
}
And the rolling function:
F_getDecileVal_running <- function(x, decilecut=0.1,interval){
len<-length(x)
#Modified by ML 5/4/2013
y <- array(NA, dim=len)
if(len >= interval){
y <- running(x, fun=F_getDecileVal, width=interval,records=1, pad=TRUE,simplify=TRUE)
y[1:interval] <- F_getDecileVal(x[1:interval])
}
return(y)
}
# system.time(F_getDecileVal_running(mydata[,8],interval=250))
# > dim(mydata)
# [1] 5677 9
#user system elapsed
# 4.28 0.00 4.38