I'm curious if anyone out there can come up with a (faster) way to calculate rolling statistics (rolling mean, median, percentiles, etc.) over a variable interval of time (windowing).
That is, suppose one is given randomly timed observations (i.e. not daily, or weekly data, observations just have a time stamp, as in ticks data), and suppose you'd like to look at center and dispersion statistics that you are able to widen and tighten the interval of time over which these statistics are calculated.
I made a simple for loop that does this. But it obviously runs very slow (In fact I think my loop is still running over a small sample of data I set up to test its speed). I've been trying to get something like ddply to do this - which seems straitforward to get to run for daily stats - but I can't seem to work my way out of it.
Example:
Sample Set-up:
df <- data.frame(Date = runif(1000,0,30))
df$Price <- I((df$Date)^0.5 * (rnorm(1000,30,4)))
df$Date <- as.Date(df$Date, origin = "1970-01-01")
Example Function (that runs really slow with many observations
SummaryStats <- function(dataframe, interval){
# Returns daily simple summary stats,
# at varying intervals
# dataframe is the data frame in question, with Date and Price obs
# interval is the width of time to be treated as a day
firstDay <- min(dataframe$Date)
lastDay <- max(dataframe$Date)
result <- data.frame(Date = NULL,
Average = NULL, Median = NULL,
Count = NULL,
Percentile25 = NULL, Percentile75 = NULL)
for (Day in firstDay:lastDay){
dataframe.sub = subset(dataframe,
Date > (Day - (interval/2))
& Date < (Day + (interval/2)))
nu = data.frame(Date = Day,
Average = mean(dataframe.sub$Price),
Median = median(dataframe.sub$Price),
Count = length(dataframe.sub$Price),
P25 = quantile(dataframe.sub$Price, 0.25),
P75 = quantile(dataframe.sub$Price, 0.75))
result = rbind(result,nu)
}
return(result)
}
Your advice would be welcome!