I'm currently writing a code that will call a specific function, depending on the value of an element in a vector. My question, then, is whether or not this is efficient. If I understand the ifelse
algorithm correctly, whatever values I put as the 2nd and 3rd arguments to the function are calculated in their entirety and then subsetted based on the TRUE
or FALSE
values of my condition. This is in contrast to the typical if/else
structure we see in coding, where we'd evaluate a condition and then run a function on the element only once we know which function to run. To test this out, I tried to use the following:
test1 <- function() {
x <- sample(1:1e9, 1e6, replace = TRUE)
y <- ifelse(x %% 2 == 0, x**2, x/2)
return(y)
}
test2 <- function() {
x <- sample(1:1e9, 1e6, replace = TRUE)
y <- numeric(length(x))
for (i in 1:length(x)) {
if (x[i] %% 2 == 0) {
y[i] <- x[i]**2
} else {
y[i] <- x[i]/2
}
}
return(y)
}
microbenchmark::microbenchmark(test1(), test2(), times = 1000)
Unit: milliseconds
expr min lq mean median uq max neval
test1() 2.366067 2.494746 8.27343 2.580164 2.706826 1690.049 1000
test2() 21.773385 23.050818 29.70450 23.712907 29.468783 3169.008 1000
The mean values seem to indicate that the ifelse
approach is favorable over if/else
.
The reason I'm asking is because I'll have relatively large XML files that I'm parsing and the parsing methods I implement will vary depending on the layout of the children in the tree and I'm trying to be as efficient as possible.
So two questions: 1) Are my conclusions above correct, that ifelse
is faster than if/else
, and 2) does ifelse
calculate all values for both yes
and no
vectors and then subset them?
Thanks in advance.
Edit
The code above, as well as some of the question text, has been modified to reflect the comments below.