There has been some discussion about how ifelse
is not the best option for code where speed is an important factor. You might instead try:
df$Mean.Result1 <- c("", "Equal")[(df$A > 0.05 & df$B > 0.05)+1]
To see what's going on here, let's break down the command. df$A > 0.05 & df$B > 0.05
returns TRUE
if both A
and B
exceed 0.05, and FALSE
otherwise. Therefore, (df$A > 0.05 & df$B > 0.05)+1
returns 2 if both A
and B
exceed 0.05 and 1 otherwise. These are used as indicates into the vector c("", "Equal")
, so we get "Equal"
when both exceed 0.05 and ""
otherwise.
Here's a comparison on a data frame with 1 million rows:
# Build dataset and functions
set.seed(144)
big.df <- data.frame(A = runif(1000000), B = runif(1000000))
OP <- function(df) {
df$Mean.Result1 <- ifelse(df$A > 0.05 & df$B > 0.05, "Equal", "")
df
}
josilber <- function(df) {
df$Mean.Result1 <- c("", "Equal")[(df$A > 0.05 & df$B > 0.05)+1]
df
}
all.equal(OP(big.df), josilber(big.df))
# [1] TRUE
# Benchmark
library(microbenchmark)
microbenchmark(OP(big.df), josilber(big.df))
# Unit: milliseconds
# expr min lq mean median uq max neval
# OP(big.df) 299.6265 311.56167 352.26841 318.51825 348.09461 540.0971 100
# josilber(big.df) 40.4256 48.66967 60.72864 53.18471 59.72079 267.3886 100
The approach with vector indexing is about 6x faster in median runtime.