2

If you only have a few rules in a discrete event simulation this is not critical but if you have a lot of them and they can interfere with each other and you may want to track the "which" and "where" they are used.

  • Does anybody know how to get the code below as fast as the original function?
  • Are there better options than eval(parse(...)?

Here is an simple example which shows that I loose a factor 100 in speed. Assume you run a simulation and one (of many rules) is: Select the states with time less 5:

> a <- rnorm(100, 50, 10)
> print(summary(microbenchmark::microbenchmark(a[a < 5], times = 1000L, unit = "us")))
   expr  min   lq     mean median   uq    max neval
a[a < 5] 0.76 1.14 1.266745  1.141 1.52 11.404  1000

myfun <- function(a0) {
  return(eval(parse(text = myrule)))
}

> myrule <- "a < a0" # The rule could be read from a file.
print(summary(microbenchmark::microbenchmark(a[myfun(5)], times = 1000L, unit = "us")))
    expr    min      lq     mean  median      uq     max neval
a[myfun(5)] 137.61 140.271 145.6047 141.411 142.932 343.644  1000

Note: I don't think that I need an extra rete package which can do the book keeping efficiently. But if there are other opinions, let me know...

Community
  • 1
  • 1
Christoph
  • 6,841
  • 4
  • 37
  • 89

1 Answers1

2

Let's profile this:

Rprof()
for (i in 1:1e4) a[myfun(5)]
Rprof(NULL)
summaryRprof()

#$by.self
#             self.time self.pct total.time total.pct
#"parse"           0.36    69.23       0.48     92.31
#"structure"       0.04     7.69       0.06     11.54
#"myfun"           0.02     3.85       0.52    100.00
#"eval"            0.02     3.85       0.50     96.15
#"stopifnot"       0.02     3.85       0.06     11.54
#"%in%"            0.02     3.85       0.02      3.85
#"anyNA"           0.02     3.85       0.02      3.85
#"sys.parent"      0.02     3.85       0.02      3.85
#
#$by.total
#               total.time total.pct self.time self.pct
#"myfun"              0.52    100.00      0.02     3.85
#"eval"               0.50     96.15      0.02     3.85
#"parse"              0.48     92.31      0.36    69.23
#"srcfilecopy"        0.12     23.08      0.00     0.00
#"structure"          0.06     11.54      0.04     7.69
#"stopifnot"          0.06     11.54      0.02     3.85
#".POSIXct"           0.06     11.54      0.00     0.00
#"Sys.time"           0.06     11.54      0.00     0.00
#"%in%"               0.02      3.85      0.02     3.85
#"anyNA"              0.02      3.85      0.02     3.85
#"sys.parent"         0.02      3.85      0.02     3.85
#"match.call"         0.02      3.85      0.00     0.00
#"sys.function"       0.02      3.85      0.00     0.00

Most of the time is spent in parse. We can confirm this with a benchmark:

microbenchmark(a[myfun(5)], times = 1000L, unit = "us")
#Unit: microseconds
#        expr    min     lq     mean median     uq     max neval
# a[myfun(5)] 67.347 69.141 72.12806 69.909 70.933 160.303  1000

a0 <- 5
microbenchmark(parse(text = myrule), times = 1000L, unit = "us")
#Unit: microseconds
#                 expr    min     lq     mean median     uq     max neval
# parse(text = myrule) 62.483 64.275 64.99432 64.787 65.299 132.903  1000

If reading the rules as text from a file is a hard requirement, I don't think there is a way to speed this up. Of course, you should not parse the same rule repeatedly, but I assume you now that.

Edit in response to a comment providing more explanation:

You should store your rules as quoted expressions (e.g., in a list using saveRDS if you need them as a file):

myrule1 <- quote(a < a0)
myfun1 <- function(rule, a, a0) {eval(rule)}

microbenchmark(a[myfun1(myrule1, a, 30)], times = 1000L, unit = "us")
#Unit: microseconds
#                      expr   min    lq     mean median    uq    max neval
# a[myfun1(myrule1, a, 30)] 1.792 2.049 2.286815  2.304 2.305 30.217  1000

For convenience, you could then make that list of expressions an S3 object and create a nice print method for it in order to get a better overview.

Roland
  • 127,288
  • 10
  • 191
  • 288
  • I understand. Do you have an idea, how to track the "which" and "where" in a different way - not using a file as intended above? – Christoph Jun 15 '16 at 07:35
  • I don't understand `to track the "which" and "where" in a different way`. You problem description seems to be missing critical parts. Do you have your rules in some other form than as a text file? – Roland Jun 15 '16 at 07:45
  • I run discrete event simulations. At the moment the rules are spread as `if-else` statements all over the code. The more rules we add, the more conflicts between rules can occur and the whole book keeping gets more and more complicated (Even simple questions like: Didn't we implement a similar rule some months ago? Where?...) So I thought there must be a smart way (that is not an extra document;-) to deal with that. The way above would solve those problems at the cost of speed which unfortunately is not an option. – Christoph Jun 15 '16 at 07:59
  • Sorry, I am quite new to R and never heard about that: You mean `saveRDS(myrule1, file = "myrule1.rds")`, `res <- readRDS(file = "myrule1.rds")`. What do you mean by `S3`object? Is there a better way than `print(res)` which prints `a < a0` – Christoph Jun 15 '16 at 09:04
  • 1
    I mean for printing `list(myrule1, myrule2)` a tabular form would be nicer. See for an explanation of OOP in R: http://adv-r.had.co.nz/S3.html – Roland Jun 15 '16 at 09:09