I've written a function that takes a data.frame which represent intervals of data which occur across a 1 minute timeframe. The purpose of the function is to take these 1 minute intervals and convert them into higher intervals. Example, 1 minute becomes 5 minute, 60 minute etc...The data set itself has the potential to have gaps in the data i.e. jumps in time so it must accommodate for these bad data occurrences. I've written the following code which appears to work but the performance is absolutely terrible on large data sets.
I'm hoping that someone could provide some suggestions on how I might be able to speed this up. See below.
compressMinute = function(interval, DAT) {
#Grab all data which begins at the same interval length
retSet = NULL
intervalFilter = which(DAT$time$min %% interval == 0)
barSet = NULL
for (x in intervalFilter) {
barEndTime = DAT$time[x] + 60*interval
barIntervals = DAT[x,]
x = x+1
while(x <= nrow(DAT) & DAT[x,"time"] < barEndTime) {
barIntervals = rbind(barIntervals,DAT[x,])
x = x + 1
}
bar = data.frame(date=barIntervals[1,"date"],time=barIntervals[1,"time"],open=barIntervals[1,"open"],high=max(barIntervals[1:nrow(barIntervals),"high"]),
low=min(barIntervals[1:nrow(barIntervals),"low"]),close=tail(barIntervals,1)$close,volume=sum(barIntervals[1:nrow(barIntervals),"volume"]))
if (is.null(barSet)) {
barSet = bar
} else {
barSet = rbind(barSet, bar)
}
}
return(barSet)
}
EDIT:
Below is a row of my data. Each row represents a 1 minute interval, I am trying to convert this into arbitrary buckets which are the aggregates of these 1 minute intervals, i.e. 5 minutes, 15 minutes, 60 minutes, 240 minutes, etc...
date time open high low close volume
2005-09-06 2005-09-06 16:33:00 1297.25 1297.50 1297.25 1297.25 98