I am using this code to create candlesticks in plotly. However, it contains a loop which is very inefficient (38 secs to loop through 10K observations). It also uses the rbind
function which means the date has to be converted to numeric and then back again, which doesn't appear to be straight forward considering its a date with time.
The loop Im trying to replace with a more efficient function is:
for(i in 1:nrow(prices)){
x <- prices[i, ]
# For high / low
mat <- rbind(c(x[1], x[3]),
c(x[1], x[4]),
c(NA, NA))
plot.base <- rbind(plot.base, mat)
}
The output is a vector with the first observation being the 1st(date) and 3rd col from input data, the second observation is the 1st and 4th col from input data, and the third observation is two NAs. The NAs are important later on for the plotting.
What is the most efficient way to achieve this?
Minimal reproducible example:
library(quantmod)
prices <- getSymbols("MSFT", auto.assign = F)
# Convert to dataframe
prices <- data.frame(time = index(prices),
open = as.numeric(prices[,1]),
high = as.numeric(prices[,2]),
low = as.numeric(prices[,3]),
close = as.numeric(prices[,4]),
volume = as.numeric(prices[,5]))
# Create line segments for high and low prices
plot.base <- data.frame()
for(i in 1:nrow(prices)){
x <- prices[i, ]
# For high / low
mat <- rbind(c(x[1], x[3]),
c(x[1], x[4]),
c(NA, NA))
plot.base <- rbind(plot.base, mat)
}
Edit:
dput(head(prices))
structure(list(time = structure(c(13516, 13517, 13518, 13521,
13522, 13523), class = "Date"), open = c(29.91, 29.700001, 29.629999,
29.65, 30, 29.799999), high = c(30.25, 29.969999, 29.75, 30.1,
30.18, 29.889999), low = c(29.4, 29.440001, 29.450001, 29.530001,
29.73, 29.43), close = c(29.860001, 29.809999, 29.639999, 29.93,
29.959999, 29.66), volume = c(76935100, 45774500, 44607200, 50220200,
44636600, 55017400)), .Names = c("time", "open", "high", "low",
"close", "volume"), row.names = c(NA, 6L), class = "data.frame")