I have written a loop in R and I would like to make it run a lot faster. The task is to calculate delta values for a time
column in a data frame (tibble.) The wrinkle is that each delta should be taken from the previous row whose level
column has a value (range 1-9) is greater than or equal to the current row. I need to run this over approximately one billion rows and current performance is substantially below one million rows per second. So I am looking for at least one order of magnitude speed-up.
Here is the code:
ref <- as.numeric(rep(NA, 9)) # separate reference timestamp per level
timedelta <- function(level, time) {
delta <- time - ref[level]
ref[1:level] <<- time
delta
}
mapply(timedelta, tl$level, tl$time)
How do I make that run fast?
(I have asked the same question in the context of dplyr over at How to add flexible delta columns using dplyr? but I did not manage to get the performance I need with dplyr and so I am asking again here.)