Replace Loop with vectorised operation

Question

I am using this code to create candlesticks in plotly. However, it contains a loop which is very inefficient (38 secs to loop through 10K observations). It also uses the rbind function which means the date has to be converted to numeric and then back again, which doesn't appear to be straight forward considering its a date with time.

The loop Im trying to replace with a more efficient function is:

for(i in 1:nrow(prices)){
x <- prices[i, ]

# For high / low
mat <- rbind(c(x[1], x[3]), 
             c(x[1], x[4]),
             c(NA, NA))

plot.base <- rbind(plot.base, mat)
}

The output is a vector with the first observation being the 1st(date) and 3rd col from input data, the second observation is the 1st and 4th col from input data, and the third observation is two NAs. The NAs are important later on for the plotting.

What is the most efficient way to achieve this?

Minimal reproducible example:

library(quantmod)

  prices <- getSymbols("MSFT", auto.assign = F)

  # Convert to dataframe
  prices <- data.frame(time = index(prices),
                       open = as.numeric(prices[,1]),
                       high = as.numeric(prices[,2]),
                       low = as.numeric(prices[,3]),
                       close = as.numeric(prices[,4]),
                       volume = as.numeric(prices[,5]))

 # Create line segments for high and low prices
  plot.base <- data.frame()

    for(i in 1:nrow(prices)){
x <- prices[i, ]

# For high / low
mat <- rbind(c(x[1], x[3]), 
             c(x[1], x[4]),
             c(NA, NA))

plot.base <- rbind(plot.base, mat)
}

Edit:

dput(head(prices))
structure(list(time = structure(c(13516, 13517, 13518, 13521, 
13522, 13523), class = "Date"), open = c(29.91, 29.700001, 29.629999, 
29.65, 30, 29.799999), high = c(30.25, 29.969999, 29.75, 30.1, 
30.18, 29.889999), low = c(29.4, 29.440001, 29.450001, 29.530001, 
29.73, 29.43), close = c(29.860001, 29.809999, 29.639999, 29.93, 
29.959999, 29.66), volume = c(76935100, 45774500, 44607200, 50220200, 
44636600, 55017400)), .Names = c("time", "open", "high", "low", 
"close", "volume"), row.names = c(NA, 6L), class = "data.frame")

The code is growing an object (`plot.base`). That's about the slowest operation you can do in programming. Please provide [a minimal reproducible example](http://stackoverflow.com/a/5963610/1412059) to facilitate development and testing of alternatives. — Roland, Jun 01 '16 at 12:53
@Roland the full example is in the link. I will include a minimal example in original post — Ed Wilson, Jun 01 '16 at 12:55
Sorry, I won't install a package just to recreate an example for stackoverflow. Just provide the output of `dput(head(prices))` and show the corresponding expected output. — Roland, Jun 01 '16 at 13:05
@Roland Ah ok, that makes sense. Output of dput(head(prices)) added! — Ed Wilson, Jun 01 '16 at 13:10
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/113531/discussion-between-ed-wilson-and-roland). — Ed Wilson, Jun 01 '16 at 13:11

Roland · Accepted Answer · 2016-06-01T13:19:45.647

I would be wary of a tutorial that grows an object in a loop. That's one of the slowest operations you can do in programming. (It's like buying a shelf that has exactly the room needed for your books and then replacing the shelf every time you buy a new book.)

Use subsetting like this:

res <- data.frame(date = rep(prices[, 1], each = 3),
                  y = c(t(prices[,c(3:4)])[c(1:2, NA),])) #transpose, subset, make to vector
res[c(FALSE, FALSE, TRUE), 1] <- NA
#         date     y
#1  2007-01-03 30.25
#2  2007-01-03 29.40
#3        <NA>  <NA>
#4  2007-01-04 29.97
#5  2007-01-04 29.44
#6        <NA>  <NA>
#7  2007-01-05 29.75
#8  2007-01-05 29.45
#9        <NA>  <NA>
#10 2007-01-08 30.10
#11 2007-01-08 29.53
#12       <NA>  <NA>
#13 2007-01-09 30.18
#14 2007-01-09 29.73
#15       <NA>  <NA>
#16 2007-01-10 29.89
#17 2007-01-10 29.43
#18       <NA>  <NA>

on 10K observations: original loop 30.54 secs. This method was 0.013 secs. — Ed Wilson, Jun 01 '16 at 13:25

Replace Loop with vectorised operation

1 Answers1