0

I'm currently writing a code that will call a specific function, depending on the value of an element in a vector. My question, then, is whether or not this is efficient. If I understand the ifelse algorithm correctly, whatever values I put as the 2nd and 3rd arguments to the function are calculated in their entirety and then subsetted based on the TRUE or FALSE values of my condition. This is in contrast to the typical if/else structure we see in coding, where we'd evaluate a condition and then run a function on the element only once we know which function to run. To test this out, I tried to use the following:

test1 <- function() {
  x <- sample(1:1e9, 1e6, replace = TRUE)
  y <- ifelse(x %% 2 == 0, x**2, x/2)
  return(y)
}

test2 <- function() {
  x <- sample(1:1e9, 1e6, replace = TRUE)
  y <- numeric(length(x))
  for (i in 1:length(x)) {
    if (x[i] %% 2 == 0) {
      y[i] <- x[i]**2
    } else {
      y[i] <- x[i]/2
    }
  }
  return(y)
}
microbenchmark::microbenchmark(test1(), test2(), times = 1000)

Unit: milliseconds
    expr       min        lq     mean    median        uq      max neval
 test1()  2.366067  2.494746  8.27343  2.580164  2.706826 1690.049  1000
 test2() 21.773385 23.050818 29.70450 23.712907 29.468783 3169.008  1000

The mean values seem to indicate that the ifelse approach is favorable over if/else.

The reason I'm asking is because I'll have relatively large XML files that I'm parsing and the parsing methods I implement will vary depending on the layout of the children in the tree and I'm trying to be as efficient as possible.

So two questions: 1) Are my conclusions above correct, that ifelse is faster than if/else, and 2) does ifelse calculate all values for both yes and no vectors and then subset them?

Thanks in advance.

Edit

The code above, as well as some of the question text, has been modified to reflect the comments below.

tblznbits
  • 6,602
  • 6
  • 36
  • 66
  • 1
    `ifelse` has a whole bunch of checking of inputs etc that your `test2` function does not have. This probably accounts for most of the nanoseconds of difference. The core of `ifelse` is essentially an `if/else` the same as what you have + some adjustment for `NA` values. – thelatemail Feb 10 '16 at 22:25
  • 1
    Your `test2` has an error in it; try running it by itself. Also your benchmark command should call `test1()` and `test2()`, otherwise it's not actually running the code. – Aaron left Stack Overflow Feb 10 '16 at 22:37
  • 1
    And if you really want to see the performance difference, pre-simulate `x` and pass it in to your functions. – Gregor Thomas Feb 10 '16 at 22:39
  • @aaron Good call! I'll update the question to reflect that. – tblznbits Feb 10 '16 at 22:43
  • You should find that `ifelse` is substantially faster for this example. – Aaron left Stack Overflow Feb 10 '16 at 22:44
  • @Aaron Yeah, that's what I just found. Had I actually formatted my reproducible example correctly, this question probably wouldn't have even been asked. I'll be deleting it since there's no longer a real question here. – tblznbits Feb 10 '16 at 22:46
  • In response to 2), according to the help file, "yes will be evaluated if and only if any element of test is true, and analogously for no," so the answer is yes except for the case where one is never needed. – Aaron left Stack Overflow Feb 10 '16 at 22:46
  • 1
    Also see the suggestions in the help file under the `Warning` section that give an example of a type of construction that might be preferred. – Aaron left Stack Overflow Feb 10 '16 at 22:51
  • @gregor I don't necessarily have a question anymore, but SO wouldn't let me delete the question because it had an answer. – tblznbits Feb 10 '16 at 22:56

1 Answers1

2

The way you've coded does worse than ifelse, but as suggested in the warning section of ?ifelse it's possible to do better. With your simple functions, x^2 and x / 2, the test3() function below is faster - about 2 to 3 times faster than ifelse and 30 times faster than test2(). With more computationally intensive functions (but still vectorized!) the margin might be bigger.

The speed gain is (I think) mostly due to two sources:

  1. ifelse does input checking and error handling that test3() skips. ifelse is more general and more flexible... test3() is hardcoded to only return a numeric vector).
  2. As demonstrated at Does ifelse really calculate both of its vectors every time? Is it slow?, ifelse will calculate its entire TRUE response vector as long as there is at least 1 TRUE value of the test, and similarly for its FALSE. test3() bypasses the extra calculations by creating TRUE and FALSE sub-vectors.

I've modified your test1() and test2() to simplify a bit, pulling out the data simulation (since that's not what we want to test). I added test3 that uses logical subsets. I also drastically reduced the size of the test vector so it runs reasonably quickly.

set.seed(47)
x <- sample(1:1e6, 1e4, replace = TRUE)

test1 <- function(x) {
  ifelse(x %% 2 == 0, x**2, x/2)
}

test2 <- function(x) {
  y <- numeric(length(x))
  for (i in seq_along(x)) {
    if (x[i] %% 2 == 0) {
      y[i] <- x[i]**2
    } else {
      y[i] <- x[i]/2
    }
  }
  return(y)
}

test3 <- function(x) {
    y = numeric(length(x))
    cond = x %% 2 == 0
    y[cond] = x[cond] ^ 2
    y[!cond] = x[!cond] / 2
    return(y)
}

identical(test1(x), test2(x))
# TRUE
identical(test1(x), test3(x))
# TRUE
microbenchmark::microbenchmark(test1(x), test2(x), test3(x), times = 1000)
# Unit: microseconds
#      expr       min         lq       mean     median        uq        max neval cld
#  test1(x)  1563.270  1642.3540  1701.3877  1669.2180  1697.894   3159.743  1000  b 
#  test2(x) 17909.833 18788.9635 23682.1516 19882.8600 20679.436 116206.536  1000   c
#  test3(x)   627.241   668.7445   691.8433   680.6675   696.061   1340.507  1000 a  
Community
  • 1
  • 1
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • It seems as if the answer is that `ifelse` is much faster than `if/else`, but code can typically be rewritten in a way that is even faster than `ifelse`, which makes total sense to me. Thanks! – tblznbits Feb 10 '16 at 23:01