Using Purrr::map2 or pmap to avoid for loops

Question

I'm desperately trying to avoid for loops to calculate custom financial indicators (multiple stocks, 5,000 rows per stock). I'm trying to use purrr::map2, and it is fine when doing math on existing vectors, but I need to reference the lag (previous) value of the vector I'm trying to create. Without referencing a previous value, purrr::map2 works fine:

some_function <- function(a, b) {   (a * b) + ((1 - a) * b)  }
a <- c(0.019, 0.026, 0.012, 0.022)  # some indicator
b <- c(15.5, 16.7, 14.8, 13.1)  # close price
purrr::map2(a, b, some_function)

which just results in the original close values

15.5, 16.7, 14.8, 13.1

But what I'm really trying to do is create a new vector (c), that looks back on itself (lag) as part of the calculation. If it is the first row, c == b, otherwise:

desired_function <- function(a, b, c) {   (a * b) + ((1 - a) * lag(c))  }

So I create a vector c and populate and try:

c <- c(15.5, 0, 0, 0)
purrr::map2(a, b, c, desired_function)

And get all NULL values, obviously.
Values for c should be: 15.50, 15.53, 15.52, 15.47

Referencing a previous value is a common thing among indicators, and it forces me to go to clunky, slow 'for loops'. Any suggestions are greatly appreciated.

That syntax works, but my function isn't set up right to "look back" on the previous value of the created variable. Thanks for the proper syntax. — Dan Hill, Jul 03 '22 at 15:23
I've added a small experiment to my answer which implies purrr might not be so fast after all. — Caspar V., Jul 03 '22 at 21:59

Caspar V. · Answer 1 · 2022-07-03T21:58:15.280

If calculating a certain value in a vector requires another value from the same vector, then it just can't be vectorized; you'll have to calculate them one after another.

For loops aren't slow by themselves; it's how you use them. For instance, retrieving values from a data frame one value at a time, or inserting them one value at a time, is a common practice that is very slow.

The implementation of for-loops in R has improved a lot in the past 10 years, alledgedly they used to be less efficient, and in older posts you'll find many people complaining about it.

A little experiment

Let's benchmark the simplest (dumbest?) for-loop implementation with purrr::map() for a function without lag: c = a*b + (1-a) * b

On this benchmark with 10 million items, the for-loop was over 15 times faster than purrr::map2().

# functions ---------------------------------------------------------------

desired_function <- function(a,b) { a*b + (1-a) * b }

des_fnc_for <- function(a, b) {
  c <- numeric(length(a))
  c[1] <- b[1]
  for(i in seq_along(a)) c[i] <- a[i] * b[i] + (1 - a[i]) * b[i]
  return(c)
}


# verify --------------------------------------------------------------------

a <- c(0.019, 0.026, 0.012, 0.022)  # some indicator
b <- c(15.5, 16.7, 14.8, 13.1)  # close price

unlist(purrr::map2(a,b,desired_function))

[1] 15.5 16.7 14.8 13.1

des_fnc_for(a,b)

[1] 15.5 16.7 14.8 13.1


# benchmark ---------------------------------------------------------------

a <- runif(10000000, 0.01, 0.03)
b <- runif(10000000, 13, 17)

system.time( des_fnc_for(a,b) )

   user  system elapsed 
  1.143   0.007   1.163 

system.time( purrr::map2(a,b,desired_function) )

   user  system elapsed 
 18.570   0.627  19.761

wow, that's quite a difference, Caspar. I will re-examine my for loops. I must have something wonky slowing it down. — Dan Hill, Jul 03 '22 at 22:40

score 0 · Answer 2 · answered Jul 03 '22 at 15:31

Here some solutions, first one refers to your idea using stats::lag (using stats::, because the dplyr package always masks lag!),

r <- numeric(4L)
for (i in 1:4) {
  r[i] <- c[i + 1] <- a[i]*b[i] + (1 - a[i])*stats::lag(c)[i]
}
r
# [1] 15.50000 15.53120 15.52243 15.46913

and another one using a starting value that updates in every iteration, which is about 20% faster.

r <- numeric(4L)
sval <- 15.5
for (i in 1:4) {
  r[i] <- sval <- a[i]*b[i] + (1 - a[i])*sval
}
r
# [1] 15.50000 15.53120 15.52243 15.46913

Data:

a <- c(0.019, 0.026, 0.012, 0.022)
b <- c(15.5, 16.7, 14.8, 13.1)
c <- c(15.5, 0, 0, 0)

Wow, your code is 90x faster than my loop. (4 hours to 2 1/2 minutes). Thanks for the response. — Dan Hill, Jul 04 '22 at 02:16

Using Purrr::map2 or pmap to avoid for loops

2 Answers2

A little experiment