4

I was making some code optimizations where it seems that the repeat, while and for constructs seem to have quite some overhead on startup. Here is an example:

microbenchmark::microbenchmark(
  {lapply(1:10, function(x) x)},
  {i <- 0; while (i < 10) i <- i + 1},
  {for (i in 1:10) i},
  {i <- 0; repeat if ((i <- i + 1) >= 10) break}
, times = 1000)
Unit: microseconds
                                            expr    min      lq     mean  median     uq      max neval
                 { lapply(1:10, function(x) x) }    7.4   12.80   15.101   14.30   15.5    919.9  1000
           { i <- 0; while (i < 10) i <- i + 1 } 1377.8 1431.85 1830.941 1475.10 1537.1  68344.9  1000
                           { for (i in 1:10) i }  838.0  880.00 1008.845  904.45  950.6  56744.9  1000
{ i <- 0; repeat if ((i <- i + 1) >= 10) break } 2092.4 2190.05 3265.421 2248.45 2343.2 467666.1  1000

If you use these constructs many times (e.g. within an outer loop), the 100x slower adds up considerably!

This seems contrary to my belief (and that of many others). Am I missing something?

Edit: since I didn't see any effect in my real code, made me wonder...

f <- function(x) { i <- 0; repeat if ((i <- i + 1) >= x) break }
microbenchmark::microbenchmark(
  f(10L)  
, times = 1000)
Unit: nanoseconds
   expr min  lq  mean median  uq   max neval
 f(10L) 700 800 898.9    900 900 10700  1000

Note the nanoseconds! These are the times I expect from these constructs (e.g. mainly the overhead due to <-, + and >=). So it seems to be a problem with microbenchmark?

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Davor Josipovic
  • 5,296
  • 1
  • 39
  • 57
  • 2
    @markus Note that question is about startup overhead of these constructs. A `C` function call in R is about 0.1 microseconds.I don't expect a `while` statement to take 1000x longer. The bigger picture is that if these constructs are used for short loops (e.g. `seq_len(10)`) then their overhead might add up considerably if the outer loop will execute it in the order of `seq_len(1e6)`. – Davor Josipovic May 05 '20 at 21:40
  • 1
    FYI, two of your processes here work *a little* more: if you take the sequence `:` operator out, it's a more fair comparison (though not dramatically different). – r2evans May 05 '20 at 23:42
  • @r2evans Yes, I was thinking about it. But it is one of those `C` calls: takes 0.1 microseconds. Same with `+` , `<-` and `>=`. You shouldn't see any effect. – Davor Josipovic May 06 '20 at 07:05
  • 2
    @DavorJosipovic my guess would be that here we can see JIT (just-in-time) compilation taking effect. Where in your first examples each time the code is recompiled, but in second example only executed. But I am no expert in this... – minem May 06 '20 at 07:27
  • When I run the original example code, I don't see the same level of slowness: fastest is ```for``` with mean 1184nanosecs, slowest is ```lapply``` with 10253nanosecs. I'm running R 3.2.1 using RStudio. – Dominic van Essen May 06 '20 at 07:41
  • Follow-up: Command-line vs RStudio doesn't seem to make a difference, but I now ran it on a server running R 3.6.2 (through command-line) and I indeed get a similar slowness as Davor: ```lapply``` mean 32microsecs, ```repeat``` 5645microsecs. So it may arise from an 'optimization' in later versions of R. – Dominic van Essen May 06 '20 at 07:47
  • @DominicvanEssen Good point. I added my platform info. – Davor Josipovic May 06 '20 at 08:09
  • @r2evans, was thinking on that too, but consider this: `microbenchmark::microbenchmark((function(x) { i <- 0; repeat if ((i <- i + 1) >= x) break })(10))`. – Davor Josipovic May 06 '20 at 08:12
  • Note (in R 3.6.2) that the dreaded add-up of overheads doesn't seem to happen: ```{ for(j in 1:10) j }``` runs in 2.2ms, but ```{for(i in 1:1000){ for(j in 1:10) j }}``` runs in 5.0ms (only 2x slower for 1000x iterations). ```lapply``` behaves more conventionally: 1000x iterations takes ~1000x longer. – Dominic van Essen May 06 '20 at 09:13

0 Answers0