18

When I was re-reading Hadley's Advanced R recently, I noticed that he said in Chapter 6 that `if` can be used as a function like `if`(i == 1, print("yes"), print("no")) (If you have the physical book in hand, it's on Page 80)

We know that ifelse is slow (Does ifelse really calculate both of its vectors every time? Is it slow?) as it evaluates all arguments. Will `if` be a good alternative to that as if seems to only evaluate TRUE arguments (this is just my assumption)?


Update: Based on the answers from @Benjamin and @Roman and the comments from @Gregor and many others, ifelse seems to be a better solution for vectorized calculations. I'm taking @Benjamin's answer here as it provides a more comprehensive comparison and for the community wellness. However, both answers(and the comments) are worth reading.

Community
  • 1
  • 1
Hao
  • 7,476
  • 1
  • 38
  • 59

3 Answers3

18

This is more of an extended comment building on Roman's answer, but I need the code utilities to expound:

Roman is correct that if is faster than ifelse, but I am under the impression that the speed boost of if isn't particularly interesting since it isn't something that can easily be harnessed through vectorization. That is to say, if is only advantageous over ifelse when the cond/test argument is of length 1.

Consider the following function which is an admittedly weak attempt at vectorizing if without having the side effect of evaluating both the yes and no conditions as ifelse does.

ifelse2 <- function(test, yes, no){
 result <- rep(NA, length(test))
 for (i in seq_along(test)){
   result[i] <- `if`(test[i], yes[i], no[i])
 }
 result
}

ifelse2a <- function(test, yes, no){
  sapply(seq_along(test),
         function(i) `if`(test[i], yes[i], no[i]))
}

ifelse3 <- function(test, yes, no){
  result <- rep(NA, length(test))
  logic <- test
  result[logic] <- yes[logic]
  result[!logic] <- no[!logic]
  result
}


set.seed(pi)
x <- rnorm(1000)

library(microbenchmark)
microbenchmark(
  standard = ifelse(x < 0, x^2, x),
  modified = ifelse2(x < 0, x^2, x),
  modified_apply = ifelse2a(x < 0, x^2, x),
  third = ifelse3(x < 0, x^2, x),
  fourth = c(x, x^2)[1L + ( x < 0 )],
  fourth_modified = c(x, x^2)[seq_along(x) + length(x) * (x < 0)]
)

Unit: microseconds
            expr     min      lq      mean  median       uq      max neval cld
        standard  52.198  56.011  97.54633  58.357  68.7675 1707.291   100 ab 
        modified  91.787  93.254 131.34023  94.133  98.3850 3601.967   100  b 
  modified_apply 645.146 653.797 718.20309 661.568 676.0840 3703.138   100   c
           third  20.528  22.873  76.29753  25.513  27.4190 3294.350   100 ab 
          fourth  15.249  16.129  19.10237  16.715  20.9675   43.695   100 a  
 fourth_modified  19.061  19.941  22.66834  20.528  22.4335   40.468   100 a 

SOME EDITS: Thanks to Frank and Richard Scriven for noticing my shortcomings.

As you can see, the process of breaking up the vector to be suitable to pass to if is a time consuming process and ends up being slower than just running ifelse (which is probably why no one has bothered to implement my solution).

If you're really desperate for an increase in speed, you can use the ifelse3 approach above. Or better yet, Frank's less obvious* but brilliant solution.

  • by 'less obvious' I mean, it took me two seconds to realize what he did. And per nicola's comment below, please note that this works only when yes and no have length 1, otherwise you'll want to stick with ifelse3
Benjamin
  • 16,897
  • 6
  • 45
  • 65
  • 2
    Um, `fourth = c("Non-Negative", "Negative")[1L + ( x < 0 )]` – Frank Nov 30 '15 at 19:10
  • I'm noticing bigger problems than that, myself. But thanks for catching that one. My modifications are only going to work if `yes` and `no` have length 1, so I've got some improvements to do. (which will likely slow them down anyway) – Benjamin Nov 30 '15 at 19:12
  • Why not use `sapply` for `ifelse2`? `for` is much slower and gives unfair advantage :) I agree that the forth case would be the best one, since that is what `ifelse` seems to do after all the necessary checks – romants Nov 30 '15 at 19:24
  • 1
    I was going for simplicity over elegance, I suppose (simplicity being subjective). But am adding it since you asked. It still runs slower than the base `ifelse`. – Benjamin Nov 30 '15 at 19:29
  • 2
    @RomanTsegelskyi if you think that "`for` is much slower than `sapply`", it means that you didn't mess with R enough. – nicola Nov 30 '15 at 19:38
  • @benjamin, yeah I agree, it is slower, thanks for adding it though :) @nicola you might be right, but in this case it `sapply` is faster than `for`. However, probably the much clause was an overstatement – romants Nov 30 '15 at 19:41
  • 2
    @RomanTsegelskyi In many instances `sapply` can be slower than a `for` loop, since it calls `simplify2array`. On the other hand, `lapply` and `vapply` are _generally_ faster than a loop. No `*apply` is actually _much_ faster than a loop, despite what many think. – nicola Nov 30 '15 at 19:52
  • 1
    @Benjamin The `fourth` solution doesn't work. Are you aware of what you are doing? That solution works when both `yes` and `no` have length one. – nicola Nov 30 '15 at 20:07
  • @nicola, am I aware of what I am doing? Not often, no :) That's what I get for throwing in code that I haven't tested and played with. – Benjamin Nov 30 '15 at 20:24
  • Fourth solution doesn't work. Please remove this solution. Test: – Fierr Jun 28 '18 at 15:00
  • x <- c(3,3,-4,-5,6) ifelse(x < 0, x^2, x) c(x, x^2)[1L + ( x < 0 )] – Fierr Jun 28 '18 at 15:00
  • 1
    @Fierr, it is stated in the note at the end of the answer that `fourth` only works in the case that `yes` and `no` have length 1 (i.e. `x` has length 1). I have added a modified version that works on a similar concept and is still similarly quick. – Benjamin Jun 28 '18 at 15:19
10

if is a primitive (complied) function called through the .Primitive interface, while ifelse is R bytecode, so it seems that if will be faster. Running some quick benchmarks

> microbenchmark(`if`(TRUE, "a", "b"), ifelse(TRUE, "a", "b"))
Unit: nanoseconds
                   expr  min   lq    mean median     uq   max neval cld
 if (TRUE) "a" else "b"   46   54  372.59   60.0   68.0 30007   100  a 
 ifelse(TRUE, "a", "b") 1212 1327 1581.62 1442.5 1617.5 11743   100   b

> microbenchmark(`if`(FALSE, "a", "b"), ifelse(FALSE, "a", "b"))
Unit: nanoseconds
                    expr  min   lq    mean median   uq   max neval cld
 if (FALSE) "a" else "b"   47   55   91.64   61.5   73  2550   100  a 
 ifelse(FALSE, "a", "b") 1256 1346 1688.78 1460.0 1677 17260   100   b

It seems that if not taking into account the code that is in actual branches, if is at least 20x faster than ifelse. However, note that this doesn't account the complexity of expression being tested and possible optimizations on that.

Update: Please note that this quick benchmark represent a very simplified and somewhat biased use case of if vs ifelse (as pointed out in the comments). While it is correct, it underrepresents the ifelse use cases, for that Benjamin's answer seems to provided more fair comparison.

romants
  • 3,660
  • 1
  • 21
  • 33
  • Thanks @romantsegelskyi! I guess `\`if\`` wins the 1st performance round. I will wait for a while for some possible new answers before I accept yours. :) – Hao Nov 30 '15 at 18:48
  • 1
    Ok, so you've saved under a second... in what context could this possibly be useful? Folks should just use the right tool for the job. – Frank Nov 30 '15 at 18:58
  • @Frank It doesn't matter if you are using R in the regular way. However, if you are using R to build a shiny site, and you use quite a few ifelse, that time difference might affect your user experience a little bit. :P – Hao Nov 30 '15 at 19:01
  • 4
    @Frank, why does everything have to be useful? What happened to just being curious? :) – romants Nov 30 '15 at 19:02
  • 4
    Curiosity killed the cat. – Rich Scriven Nov 30 '15 at 19:03
  • @RichardScriven curiosity killed the cat, but satisfaction brought it back. – Shondeslitch Nov 30 '15 at 19:12
  • 6
    Curiosity is well and good; the danger lies in R beginners reading this and thinking "Instead of `ifelse` I should use always use `if` because it is faster," without understanding the difference in use cases. – Gregor Thomas Nov 30 '15 at 19:19
  • 4
    It also seems very unsurprising - of course `if` will be faster than `ifelse` for evaluating a single condition, and `+` and `-` will be faster than `sum` and `diff` when only two numbers are involved. – Gregor Thomas Nov 30 '15 at 19:21
0

Yes. I develop a for 152589 records using ifelse() took 90 min and using if() improve to 25min

for(i in ...){
  # "Case 1"
  # asesorMinimo<-( dummyAsesor%>%filter(FechaAsignacion==min(FechaAsignacion)) )[1,] 
  # asesorRegla<-tail(dummyAsesor%>%filter( FechaAsignacion<=dumFinClase)%>%arrange(FechaAsignacion),1)
  # #Asigna Asesor
  # dummyRow<-dummyRow%>%mutate(asesorRetencion=ifelse(dim(asesorRegla)[1]==0,asesorMinimo$OperadorNombreApellido,asesorRegla$OperadorNombreApellido))



  # "Case 2"
  asesorRegla<-tail(dummyAsesor%>%filter( FechaAsignacion<=dumFinClase)%>%arrange(FechaAsignacion),1)
  asesorMinimo<-( dummyAsesor%>%filter(FechaAsignacion==min(FechaAsignacion)) )[1,] 
  if(dim(asesorRegla)[1]==0){
    dummyRow<-dummyRow%>%mutate(asesorRetencion=asesorMinimo[1,7])
  }else{
    dummyRow<-dummyRow%>%mutate(asesorRetencion=asesorRegla[1,7])
  }

}
  • Welcome to StackOverflow! Your answer is empirical statement of the fact. I suspect that OP is aware of this fact and needed advice or explanation. There are good answers to this question with explanation. If you think that you can give better answer then please provide extra details. – Maxim Sagaydachny Dec 18 '19 at 20:19