I just got a gig to help speed up a program in R by improving the efficiency of the algorithms used to calculate data. There are many loops that do different calculations, and I'm wondering which loops end up using the most resources. I want to know how can I count the amount of time it takes for a loop to completely finish. I can use that information to figure out which algorithms to optimize, or even to write a C extension that will handle the calculations.
3 Answers
You can use:
Sys.time()
orsystem.time()
- The
rbenchmark
package - The
microbenchmark
package - Or a profiler (e.g.
?RProf
)

- 48,880
- 13
- 125
- 145

- 71,271
- 35
- 175
- 235
-
Do you list them in order ? – agstudy Dec 04 '12 at 18:46
-
For the first bullet point, you can use `beg <- Sys.time(); { MyCode() }; Sys.time() - beg` _or_ `system.time({ MyCode() })` which is probably preferable. – GSee Dec 04 '12 at 18:59
I use Rprof
to tell where to look.
It generates a file of stack samples, and I just look at a small number of those, like 10, chosen randomly.
Or I just make the time between samples large enough so I don't get too many samples to begin with.
There are 2 reasons this works.
1) By actually examining individual stack samples, with your own eyes, you can see problems that simple statistics don't expose, because by looking at the stack, you can see the reasons why things are being done. That tells you if you could get rid of it, and that's the essential information.
2) If you see such a pattern of activity that you could improve, you only have to see it on more than one sample to know it's worth fixing. All the extra samples, if they mean you cannot do (1), are actually detrimental.

- 48,880
- 13
- 125
- 145

- 40,059
- 14
- 91
- 135
here is an example of using benchmark from another SO questions which compared using tapply
vs by
vs data.table
: Edited as per on comments
library(rbenchmark)
# Different tests being compared
benchmark( using.tapply = tapply(x[, 1], x[, "f"], mean),
using.by = by(x[, 1], x[, "f"], mean),
using.dtable = dt[,mean(col1),by=key(dt)]),
# Number of reps. How results are.
replications = 250, order = "relative"
)
#------------------------#
# RESULTS #
#------------------------#
# COMPARING data.table VS tapply VS by #
#------------------------------------------#
# test elapsed relative
# 2 using.dtable 0.168 1.000
# 1 using.tapply 2.396 14.262
# 3 using.by 8.566 50.988

- 1
- 1

- 54,400
- 17
- 144
- 178
-
-
@ GSee. I like to save them as expressions so that I can easily change the input once across all my tests. (Also, I find it gives a nicer output in the `test` column) – Ricardo Saporta Dec 04 '12 at 18:47
-
`benchmark(using.tapply=tappyl(x[, 1], x[, "f"], mean), using.by=by(x[, 1], x[, "f"], mean), using.dtable=dt[, mean(col1),by=key(dt)], replications=10, order='relative')` ... whatever `x` and `dt` are – GSee Dec 04 '12 at 18:49
-
Just name the `...` arguments: `benchmark(using.tapply=tapply(x[, 1], x[, "f"], mean))`. – Joshua Ulrich Dec 04 '12 at 18:50
-
yes of course. But if I have 10~15 tests I am comparing, taking the actual code in and out is a lot more cumbersome than using a variable. Wrapping them in `expression` I find moves more quickly. Is there any downside to using `expression`? (ie, any risk to getting the wrong results?) – Ricardo Saporta Dec 04 '12 at 18:52
-
2Put each test on its own line. Then you can easily add/remove them. The biggest problem I see with using `expression` is that you're probably going to confuse others who look at your code. – Joshua Ulrich Dec 04 '12 at 18:59
-
Plus `expression` isn't even the right function to use here - `quote` would be more appropriate. – hadley Dec 04 '12 at 21:03
-
-
@RicardoSaporta `expression` produces (basically) a lists of calls. Expression lists are rarely needed outside of special cases, such as sourcing a file, and you're best off using the simplest quoted call, as produced by `quote`. – hadley Dec 04 '12 at 21:12
-
@hadley, please correct me if I'm wrong, but it appears that `expression` is indeed the way to go: http://stackoverflow.com/questions/13713116/benchmarking-using-expression-quote-or-neither – Ricardo Saporta Dec 04 '12 at 22:32
-