For loops aren't necessarily slow in R. It is calling a set of functions a very large number of times, which can be slow (an with more recent versions of R, even that isn't as slow as it was). However, for loops can often be completely avoided by using vectorised code which is many times faster.
In general using eval
and parse
is not needed and generally an indication that a suboptimal solution is used. In this case (without knowing the complete problem), I am not completely sure how to avoid that. However by writing the loops more efficient a speed gain of over a factor 20 can be gained without using Rcpp.
Generate data
r <- c("A==A[i] & B==B[i]", "A==A[i] & C==C[i] ", "B==B[i] & C==C[i] ",
"A==A[i] & B==B[i] & C==C[i] ")
DF <- read.table(textConnection(" A B C
1 11 22 88
2 11 22 47
3 2 30 21
4 3 30 21"))
DF <- DF[sample(nrow(DF), 1E3, replace=TRUE), ]
Measure time of initial implementation
> system.time({
+ output2=list()
+ for (j in r){
+ for (i in 1:nrow(DF)){
+ output2[[j]][i]=nrow(subset(DF,eval(parse(text=j))))
+ }
+ }
+ })
user system elapsed
1.120 0.007 1.127
Preallocate result; doesn't help much in this case
> system.time({
+ output2=vector(length(r), mode = "list")
+ names(output2) <- r
+ for (j in r){
+ output2[[i]] <- numeric(nrow(DF))
+ for (i in 1:nrow(DF)){
+ output2[[j]][i]=nrow(subset(DF,eval(parse(text=j))))
+ }
+ }
+ })
user system elapsed
1.116 0.000 1.116
subset is not needed as we only need the number of rows. subset ceates a completely new data.frame, which generates overhead
> system.time({
+ output2=vector(length(r), mode = "list")
+ names(output2) <- r
+ for (j in r){
+ output2[[i]] <- numeric(nrow(DF))
+ for (i in 1:nrow(DF)){
+ output2[[j]][i]=sum(eval(parse(text=j), envir = DF))
+ }
+ }
+ })
user system elapsed
0.622 0.003 0.626
Parsing r takes time and is repeated nrow(DF) times, remove form inner loop
> system.time({
+ output2=vector(length(r), mode = "list")
+ names(output2) <- r
+ for (j in r){
+ output2[[i]] <- numeric(nrow(DF))
+ expr <- parse(text=j)
+ for (i in 1:nrow(DF)){
+ output2[[j]][i]=sum(eval(expr, envir = DF))
+ }
+ }
+ })
user system elapsed
0.054 0.000 0.054
A more readable and even faster implementation using dplyr
> library(dplyr)
> system.time({
+ output3 <- DF %>% group_by(A,B) %>% mutate(a = n()) %>%
+ group_by(A,C) %>% mutate(b = n()) %>%
+ group_by(B,C) %>% mutate(c = n()) %>%
+ group_by(A,B,C) %>% mutate(d = n())
+ })
user system elapsed
0.010 0.000 0.009