2

I am using a for-loop to do step-by-step calculations where several equations depend on each other. Because of this dependence, I cannot find a solution where I do the calculations inside a dataframe. My main motivation is to speed up the calculations when the Time vector is very large in the reprex below.

Could you please suggest alternatives to the following for-loop based calculations, preferably inside a dataframe in R? The only thing I can think of is using for-loop in Rcpp.

Reproducible Example

last_time <- 10
STEP = 1

Time <- seq(from = 0, to = last_time, by = STEP)

## empty vectors 
eq1 <- vector(mode = "double", length = length(Time)) 
eq2 <- vector(mode = "double", length = length(Time)) 
eq <- vector(mode = "double", length = length(Time)) 
eq3 <- vector(mode = "double", length = length(Time)) 
eq4 <- vector(mode = "double", length = length(Time)) 

## adding the first values
eq1[1] <- 25
eq2[1] <- 25
eq[1] <- 25 
eq3[1] <- 100
eq4[1] <- 2

for (t in 2:length(Time)) { 
  ## eq1
  eq1[t] <- eq[t-1] + (2.5 *  STEP * (1 - (eq[t-1])/25))
  ## eq2
  eq2[t] <- (-2 * STEP) + ((-2^2) * (STEP^2)) - (2 * eq3[t-1]) - (eq[t-1] * STEP)
  ## min.
  eq[t] <- min(eq1[t], eq2[t] )
  ## eq3
  eq3[t] <- (eq[t] - eq[t-1])/(STEP)
  ## eq4
  eq4[t] <- eq4[t-1] + (eq[t-1] * STEP) + (0.5 * eq3[t-1] * (STEP)^2)
}

Output:

my_data <- data.frame(Time, eq1, eq2, eq, eq3, eq4) 
my_data
#>    Time        eq1        eq2         eq        eq3        eq4
#> 1     0   25.00000   25.00000   25.00000 -256.00000     2.0000
#> 2     1   25.00000 -231.00000 -231.00000   25.60000  -101.0000
#> 3     2 -205.40000  225.00000 -205.40000   23.04000  -319.2000
#> 4     3 -182.36000  199.40000 -182.36000   20.73600  -513.0800
#> 5     4 -161.62400  176.36000 -161.62400   18.66240  -685.0720
#> 6     5 -142.96160  155.62400 -142.96160   16.79616  -837.3648
#> 7     6 -126.16544  136.96160 -126.16544   15.11654  -971.9283
#> 8     7 -111.04890  120.16544 -111.04890   13.60489 -1090.5355
#> 9     8  -97.44401  105.04890  -97.44401   12.24440 -1194.7819
#> 10    9  -85.19961   91.44401  -85.19961   11.01996 -1286.1037
#> 11   10  -74.17965   79.19961  -74.17965    0.00000 -1365.7934

Created on 2021-02-28 by the reprex package (v1.0.0)

jay.sf
  • 60,139
  • 8
  • 53
  • 110
umair durrani
  • 5,597
  • 8
  • 45
  • 85
  • 1
    In my experience, when I see *"calculate based on previous row"*, my first thought is generally `zoo::rollapply`, since it is relatively efficient and very "clear" for me to see and understand (readability to me sometimes beats small speed improvements). However, your rolling calcs are dependent on multiple columns, which makes it less-perfect (thought still adaptable). While there are other packages that providing rolling calcs, I don't know that you'll see much improvement, since your equations are specific (vice a good "generic", e.g., rolling mean or such). – r2evans Feb 28 '21 at 17:53
  • 2
    If you are prioritizing readability, I think what you have is likely best, and don't spend too much time optimizing into obscure/obfuscated code. If you really need performance, though, you may need to write your own C/C++ code, perhaps using [`Rcpp`](https://cran.r-project.org/web/packages/Rcpp/index.html) or [`cpp11`](https://cran.r-project.org/web/packages/cpp11/index.html). I'd think even for the C++-uninclined, the equations you have here are simple enough that you could cobble something together and get some real speed improvements. – r2evans Feb 28 '21 at 17:54
  • Thanks @r2evans. I do have a working solution in `Rcpp`, but was wondering if an R based solution was possible without a `for-loop`. – umair durrani Feb 28 '21 at 18:02
  • 1
    Have you tried mapply with stats::filter as the function and a recursive filter? – Patrick Bormann Feb 28 '21 at 19:49
  • @PatrickBormann could you please give me an example? I do not understand the `filter` argument in `stats::filter()` – umair durrani Feb 28 '21 at 20:17
  • Can't help but notice that `eq3[t-1]` gets used before it is set. No idea what the meaning of the code is, but it feels like an error. – pseudospin Feb 28 '21 at 20:44
  • The first value of `eq3` is set before the loop – umair durrani Feb 28 '21 at 21:17
  • Yes, but then it is overwritten in the first run of the loop. The second one is then used in the second run of the loop without ever being set. – pseudospin Feb 28 '21 at 21:20
  • You're right @pseudospin. This is just a toy example. I made this mistake while simplifying the actual equations. – umair durrani Feb 28 '21 at 21:27

2 Answers2

1

You could define a recursive function. A loop is faster than recursion though.

g <- function(m, STEP, time, x=2) {
  if (time == 0) m
  else {
    ## eq1
    m[x, 2] <- m[x - 1, 1] + 2.5*STEP*(1 - (m[x - 1, 1])/25)
    ## eq2
    m[x, 3] <- -2*STEP + -2^2*STEP^2 - 2*m[x - 1, 4] - m[x - 1, 1]*STEP
    ## min.
    m[x, 1] <- min(m[x, 2], m[x, 3])
    ## eq3
    m[x - 1, 4] <- (m[x, 1] - m[x - 1, 1])/STEP
    ## eq4
    m[x, 5] <- m[x - 1, 5] + m[x - 1, 1]*STEP + 0.5*m[x - 1, 4]*STEP^2
    g(m, STEP, time - 1, x + 1)
  }
}

Usage

last_time <- 10; STEP <- 1

First <- c(eq0=25, eq1=25, eq2=25, eq3=100, eq4=2)
m <- matrix(0, last_time + 1, length(First), dimnames=list(NULL, names(First)))
m[1, ] <- First
    
g(m, STEP, last_time)
#              eq0        eq1        eq2        eq3        eq4
#  [1,]   25.00000   25.00000   25.00000 -256.00000     2.0000
#  [2,] -231.00000   25.00000 -231.00000   25.60000  -101.0000
#  [3,] -205.40000 -205.40000  225.00000   23.04000  -319.2000
#  [4,] -182.36000 -182.36000  199.40000   20.73600  -513.0800
#  [5,] -161.62400 -161.62400  176.36000   18.66240  -685.0720
#  [6,] -142.96160 -142.96160  155.62400   16.79616  -837.3648
#  [7,] -126.16544 -126.16544  136.96160   15.11654  -971.9283
#  [8,] -111.04890 -111.04890  120.16544   13.60489 -1090.5355
#  [9,]  -97.44401  -97.44401  105.04890   12.24440 -1194.7819
# [10,]  -85.19961  -85.19961   91.44401   11.01996 -1286.1037
# [11,]  -74.17965  -74.17965   79.19961    0.00000 -1365.7934
jay.sf
  • 60,139
  • 8
  • 53
  • 110
1

as you asked how it works: The recursive filter function of stats::filter can be used with mapply as follows:

dataframe <-  
    mapply(stats::filter, 
           dataframe, 
           filter = vector, 
           method = "recursive")
  

where vector is e.g. c(25), which could be your first eq1[1] <- 25

The recursive filter works like a recursive loop but is a bit more elegant: Then the mapply recursive filter would do:

                   dataframe / vector
row or timepoint 1  20 
row or timepoint 2  30       + (20 * c(25))
row or timepoint 3  40       + ((20*25)+30) * c(25))

It calculates the value in the first row and uses it in the next, where it multiplies the next vector. Perhaps if you play around with stats filter and the recursive method you also get the same result. It is a row based calculation over time similar to Rcpp but more flexible.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
Patrick Bormann
  • 729
  • 6
  • 16