4

Is it possible to vectorize / speed up the execution of a FOR loop that is using the previous iteration values ?

In the reproductive example below :

  • the current production is computed from the current stock
  • the current production updates the NEXT stock
  • the next iteration used the updated stock to determine the current production, etc...

So I need to compute the stock at each iteration, in order to compute the production setpoint... Is it possible to avoid (slow) for loop ?

The current implementation takes about 45 seconds for 50k lines.

# Dummy functions for the examples. Real code is more complicated
function1 <- function(energy, stock, critical) {
    if (stock < critical) {
        return (energy)
    } else {
        return(0)
    }
}
function2 <- function(power) {
  return(round(power/100))
}
# Dummy data
d <- data.frame( "energy"= c(660, 660, 660, 660),
                 "stock" = c(20,   0,    0, 0),
                 "delivery" = c(0, 0, 2, 0),
                 "critical" = c(50, 50 ,50, 50),
                 "power" = c(0, 0, 0, 0),
                 "production" = c(0, 0, 0, 0) )

for (i in 1:length(d$energy)) {

  # Computing power, based on CUURENT stock
  d$power[i] <- function1(d$energy[i], d$stock[i], d$critical[i])

  # Computing production
  d$production[i] <- function2(d$power[i])

  # Updating NEXT stock with current production / delivery
  if (i < length(d$energy)) {
    d$stock[i+1] <- d$stock[i] + d$production[i] - d$delivery[i]
  }
}

View(d)
Stéphane V
  • 1,094
  • 2
  • 11
  • 25
  • 1
    You might be able to speed it up a bit with pseudo-loops such as `lapply` or `map`, but the problem could be in the design of your two functions, rather than in the loop itself. If you are using RStudio, try running 'Profile' on your functions to see where the time is being spent. – Andrew Gustar Nov 07 '19 at 11:25
  • is there any reason to write `for` with caps lock? –  Nov 12 '19 at 21:05
  • 1
    [This](https://stackoverflow.com/questions/42393658/lapply-vs-for-loop-performance-r) might be helpful. your function looks fine. It's as speedy as possible. It's not your foor loop but rather your functions as @AndrewGustar points out in his comment. And to answer your question, you can optimize your functions or rewrite your code in C or Fortran. –  Nov 12 '19 at 21:09

3 Answers3

3

One possibility is to use the dplyr package which is part of the tidyverse.

library(dplyr)

d %>%
  mutate(power = function1(energy, stock, critical),
         production = function2(power),
         stock_new = cumsum(stock + lag(production - delivery, 1, default = 0)))

  energy stock delivery critical power production stock_new
1    660    20        0       10   500          5        20
2    660     0        0       10   500          5        25
3    660     0        2       10   500          5        30
4    660     0        0       10   500          5        33

This works easily if the functions function1 and function2 are vectorized. If not, you would have to use purrr::map inside mutate.

Cettt
  • 11,460
  • 7
  • 35
  • 58
  • This doesn't do what the OP asked for, as `power` should be a function of `stock_new`. – Andrew Gustar Nov 07 '19 at 11:19
  • I have edited the question because function1 depends on variable "stock".... in that case, I am not able to reproduce your solution... Can you explain how to use purrr::map ? – Stéphane V Nov 07 '19 at 12:26
  • For example if `function2` is not vectorized you would replace the second line inside mutate with this: `production = purrr::map_dbl(power, function2)`. This works if your function only has one input. For two inputs you can use `purrr::map2_dbl` and for three or more inputs `pmap_dbl`. The first line inside mutate would then read as: `power = purrr::pmap_dbl(list(energy, stock, critical), function1)`. – Cettt Nov 07 '19 at 13:00
3

In base you could use Reduce with accumulate = TRUE like:

fun  <- function(x,y) {
    ttStock <- x[[2]] + x[[6]] - x[[3]]
    ttPower <- function1(y[[1]], ttStock, y[[4]])
    ttProduction <- function2(ttPower)
    c(y[[1]], ttStock, y[[3]], y[[4]], ttPower, ttProduction)
}
d$power[1] <- function1(d$energy[1], d$stock[1], d$critical[1])
d$production[1] <- function2(d$power[1])
do.call(rbind, Reduce(fun, as.data.frame(t(d[-1,])), d[1,], accumulate = TRUE))
#  energy stock delivery critical power production
#1    660    20        0       50   660          7
#2    660    27        0       50   660          7
#3    660    34        0       50   660          7
#4    660    39        2       50   660          7

To make it easier I fill in power and production in the first line of d.

In case you would use the Names instead of the column numbers:

fun  <- function(x,y) {
    names(x)  <- colnames(d)
    ttStock <- x[["stock"]] + x[["production"]] - x[["delivery"]]
    ttPower <- function1(y[[1]], ttStock, y[[4]])
    ttProduction <- function2(ttPower)
    c(y[[1]], ttStock, y[[3]], y[[4]], ttPower, ttProduction)
}
GKi
  • 37,245
  • 2
  • 26
  • 48
  • I think I can use your principle to match my needs. Is there a way to preserve the column names ? (x$stock in place of $[[2]] ) ? – Stéphane V Nov 21 '19 at 10:34
  • It seems I am dividing the computation time by 50 ! (in comparison with the for loop) – Stéphane V Nov 21 '19 at 10:37
  • 1
    @StéphaneV I added a variant where you can use names instead of column numbers. – GKi Nov 21 '19 at 10:52
  • The column names do not work because the returned value is a vector without names... So the call of the second iteration is failing... Error in x[["stock"]] : index out of bounds. – Stéphane V Nov 21 '19 at 12:41
  • 1
    @StéphaneV In `names(x) <- colnames(d)` I take the names from `d` and set them in `x`. With the functions and data from your question and the code from my answer I get a correct result. – GKi Nov 21 '19 at 12:48
0

How about saving the state across function calls.

my_env <- new.env(parent = emptyenv())
my_env$stock <- d$stock[0]

f <- function(item){
   power <- function1()
   production <- function1()/100
   stock <- my_env$stock 
   ....
   rest of the businesss logic
   ...
}

apply(d, 2, f)

abhilb
  • 5,639
  • 2
  • 20
  • 26