Below is the piece of code. It gives percentile of the trade price level for rolling 15-minute(historical) window. It runs quickly if the length is 500 or 1000, but as you can see there are 45K observations, and for the entire data its very slow. Can I apply any of the plyr functions? Any other suggestions are welcome.
This is how trade data looks like:
> str(trade)
'data.frame': 45571 obs. of 5 variables:
$ time : chr "2013-10-20 22:00:00.489" "2013-10-20 22:00:00.807" "2013-10-20 22:00:00.811" "2013-10-20 22:00:00.811" ...
$ prc : num 121 121 121 121 121 ...
$ siz : int 1 4 1 2 3 3 2 2 3 4 ...
$ aggress : chr "B" "B" "B" "B" ...
$ time.pos: POSIXlt, format: "2013-10-20 22:00:00.489" "2013-10-20 22:00:00.807" "2013-10-20 22:00:00.811" "2013-10-20 22:00:00.811" ...
And this is how the data looks like after the new column trade$time.pos
trade$time.pos <- strptime(trade$time, format="%Y-%m-%d %H:%M:%OS")
> head(trade)
time prc siz aggress time.pos
1 2013-10-20 22:00:00.489 121.3672 1 B 2013-10-20 22:00:00.489
2 2013-10-20 22:00:00.807 121.3750 4 B 2013-10-20 22:00:00.807
3 2013-10-20 22:00:00.811 121.3750 1 B 2013-10-20 22:00:00.811
4 2013-10-20 22:00:00.811 121.3750 2 B 2013-10-20 22:00:00.811
5 2013-10-20 22:00:00.811 121.3750 3 B 2013-10-20 22:00:00.811
6 2013-10-20 22:00:00.811 121.3750 3 B 2013-10-20 22:00:00.811
#t_15_index function returns the indices of the trades that were executed in last 15 minutes from the current trade(t-15 to t).
t_15_index <- function(data_vector,index) {
which(data_vector[index] - data_vector[1:index]<=15*60)
}
get_percentile <- function(data) {
len_d <- dim(trade)[1]
price_percentile = vector(length=len_d)
for(i in 1: len_d) {
t_15 = t_15_index(trade$time.pos,i)
#ecdf(rep(..)) gets the empirical distribution of the the trade size on a particular trade-price level
price_dist = ecdf(rep(trade$prc[t_15],trade$siz[t_15]))
#percentile of the current price level depending on current (t-15 to t) subset of data
price_percentile[i] = price_dist(trade$prc[i])
}
trade$price_percentile = price_percentile
trade
}
res_trade = get_percentile(trade)