0

I have the following function which takes 4 vectors. The T vector has a given length and all 3 other vectors (pga, Sa5Hz and Sa1Hz) have a given (identical but not necessarily equal to T) lenght.

The output is a matrix with length(T) rows and length(pga) columns.

My code below seems like the perfect example of what NOT to do, however, I could not figure out a way to optimize it using an apply function. Can anyone help?

designSpectrum <- function (T, pga, Sa5Hz, Sa1Hz){

  Ts <- Sa1Hz / Sa5Hz

  #By convention, if Sa5Hz is null, set Ts as 0.
  Ts[is.nan(Ts)] <- 0

  res <- matrix(NA, nrow = length(T), ncol = length(pga))

  for (i in 1:nrow(res))
  {
    for (j in 1:ncol(res))
    {
      res[i,j] <- if(T[i] <= 0) {pga[j]}
                  else if (T[i] <= 0.2 * Ts[j]) {pga[j] + T[i] * (Sa5Hz[j] - pga[j]) / (0.2 * Ts[j])}
                  else if (T[i] <= Ts[j]) {Sa5Hz[j]}
                  else Sa1Hz[j] / T[i]
      }
  }

  return(res)
}
  • 2
    belongs @ http://codereview.stackexchange.com – ivan_pozdeev Sep 30 '15 at 01:10
  • 1
    Still the myth that `for` loops are less efficient than `apply` functions. See many discussions: [here](http://stackoverflow.com/questions/2275896/is-rs-apply-family-more-than-syntactic-sugar), [here](http://stackoverflow.com/questions/7142767/why-are-loops-slow-in-r/7142982#7142982), [here](https://kbroman.wordpress.com/2013/04/02/apply-vs-for/), and [here](https://www.r-project.org/doc/Rnews/Rnews_2008-1.pdf). – Parfait Sep 30 '15 at 03:52

1 Answers1

1

Instead of doing a double for loop and processing each i and j value separately, you could use the outer function to process all of them in one shot. Since you're now processing multiple i and j values simultaneously, you could switch to the vectorized ifelse statement instead of the non-vectorized if and else statements:

designSpectrum2 <- function (T, pga, Sa5Hz, Sa1Hz) {
  Ts <- Sa1Hz / Sa5Hz
  Ts[is.nan(Ts)] <- 0
  outer(1:length(T), 1:length(pga), function(i, j) {
    ifelse(T[i] <= 0, pga[j],
      ifelse(T[i] <= 0.2 * Ts[j], pga[j] + T[i] * (Sa5Hz[j] - pga[j]) / (0.2 * Ts[j]),
        ifelse(T[i] <= Ts[j], Sa5Hz[j], Sa1Hz[j] / T[i])))
  })
}
identical(designSpectrum(T, pga, Sa5Hz, Sa1Hz), designSpectrum2(T, pga, Sa5Hz, Sa1Hz))
# [1] TRUE

Data:

T <- -1:3
pga <- 1:3
Sa5Hz <- 2:4
Sa1Hz <- 3:5

You can see the efficiency gains by testing on rather large vectors (here I'll use an output matrix with 1 million entries):

# Larger vectors
set.seed(144)
T2 <- runif(1000, -1, 3)
pga2 <- runif(1000, -1, 3)
Sa5Hz2 <- runif(1000, -1, 3)
Sa1Hz2 <- runif(1000, -1, 3)

# Runtime comparison
all.equal(designSpectrum(T2, pga2, Sa5Hz2, Sa1Hz2), designSpectrum2(T2, pga2, Sa5Hz2, Sa1Hz2))
# [1] TRUE
system.time(designSpectrum(T2, pga2, Sa5Hz2, Sa1Hz2))
#    user  system elapsed 
#   4.038   1.011   5.042 
system.time(designSpectrum2(T2, pga2, Sa5Hz2, Sa1Hz2))
#    user  system elapsed 
#   0.517   0.138   0.652 

The approach with outer is almost 10x faster.

josliber
  • 43,891
  • 12
  • 98
  • 133