What is the best macro-benchmarking tool / framework to measure a single-threaded complex algorithm in Java?

Question

I want to make some performance measures (mainly runtime) for my Java code, a single-threaded, local, complex algorithm. (So I do not want a macro-benchmark to measure a JVM implementation.)

With the tool, I would like to

analyse the complexity, i.e. see how my code scales for a parameter n (the search depth). (I already have a junit test parameterized in n.)
do some trend analysis to get warned if some change to the code base makes the code slower.

For this, I would like to use a tool or framework that

does the statistics, optimally computing the mean value, standard deviation and confidence intervals. This is very important.
can be parameterized (see parameter n above). This is also very important.
is able to produce a fancy plot would be nice, but is not required
can be used in an automated (junit-)test to warn me if my program slows done, but this is also not required, just a plus.

What tools/frameworks fulfill these requirements? Which one would be well suited for complexity and trend analysis, and why?

there is no tag "macrobenchmark" yet (though there is microbenchmark). Could somebody with sufficient rights add this tag please? — DaveFar, Aug 22 '11 at 10:37
Thanks khmarbaise, I've just started to use JETM, because it's pretty lightweight but still offers many thinks I wanted. Unfortunately, the statistics are quite weak, but maybe I can add a plug-in to improve that. — DaveFar, Aug 23 '11 at 17:20
I found a blog-entry about Runtime monitoring libraries for Java at http://day-to-day-stuff.blogspot.com/2009/01/runtime-monitoring-libraries-for-java.html. It covers Jamon, Java Simon, Usemon, Moskito, Commons monitoring, JETM, and Project Broadway. But for each tool, only a very short summary is given. — DaveFar, Aug 23 '11 at 17:20
JETM does not seem to be easily extensible for more complex statistical results: The Aggregate interface, which delivers the results, is fixed to specific values (getAverage, getMax, getMin). So extensions would have to permeate through the complete library :( — DaveFar, Aug 23 '11 at 22:05
Brent Boyer's benchmarking framework mentioned above ( http://www.ellipticgroup.com/misc/projectLibrary.zip) does look quite nice, but depends on about a dozen 3rd party libraries :( — DaveFar, Aug 23 '11 at 22:19
Just some clarification to make it more clear. You have an algo which can work at n depth levels and you want to measure the execution performance vs depth. At the same time, you want to analyze it for code sections which contribute max with increasing depth. Is this understanding of mine correct ? — Manish Singh, Aug 30 '11 at 07:32

score 71 · Accepted Answer · edited May 23 '17 at 12:02

Below is an alphabetical list of all the tools I found. The aspects mentioned are:

is it easily parameterizable
is it a Java library or at least easily integratable into your Java program
can it handle JVM micro benchmarking, e.g. use a warmup phase
can it plot the results visually
can it store the measured values persistently
can it do trend analysis to warn that a new commit caused a slow down
does it provide and use statistics (at least max, min, average and standard deviation).

Auto-pilot

parameterizable; Perl library; no JVM micro benchmarking; plotting; persistence; trend analysis!?; good statistics (run a given test until results stabilize; highlight outliers).

Benchmarking framework

not parameterizable; Java library; JVM micro benchmarking; no plotting; no persistence; no trend analysis; statistics.

Does the statistics extremely well: besides average, max, min and standard deviation, it also computes the 95% confidence interval (via bootstrapping) and serial correlation (e.g. to warn about oscillating execution times, which can occur if your program behaves nondeterministically, e.g. because you use HashSets). It decides how often the program has to be iterated to get accurate measurements and interprets these for reporting and warnings (e.g. about outliers and serial correlation).

Also does the micro-benchmarking extremely well (see Create quick/reliable benchmark with java? for details).

Unfortunately, the framework comes in a util-package bundled together with a lot of other helper-classes. The benchmark classes depend on JSci (A science API for Java) and Mersenne Twister (http://www.cs.gmu.edu/~sean/research/). If the author, Brent Boyer, finds time, he will boil the library down and add a simpler grapher so that the user can visually inspect the measurements, e.g. for correlations and outliers.

Caliper

parameterizable; Java library; JVM micro benchmarking; plotting; persistence; no trend analysis; statistics.

Relatively new project, tailored towards Android apps. Looks young but promising. Depends on Google Guava :(

Commons monitoring

not parameterizable!?; Java library; no JVM micro benchmarking!?; plotting; persistence through a servlet; no trend analysis!?; no statistics!?.

Supports AOP instrumentation.

JAMon

not parameterizable; Java library; no JVM micro benchmarking; plotting, persistence and trend analysis with additional tools (Jarep or JMX); statistics.

Good monitoring, intertwined with log4j, data can also be programmatically accessed or queried and your program can take actions on the results.

Java Simon

not parameterizable!?; Java library; no JVM micro benchmarking; plotting only with Jarep; persistence only with JMX; no trend analysis; no statistics!?.

Competitor of Jamon, supports a hierarchy of monitors.

JETM

not parameterizable; Java library; JVM micro benchmarking; plotting; persistence; no trend analysis; no statistics.

Nice lightweight monitoring tool, no dependencies :) Does not offer sufficient statistics (no standard deviation), and extending the plugIn correspondingly looks quite difficult (Aggregators and Aggregates only have fixed getters for min, max and average).

jmeter

parameterizable!?; java library; no JVM micro benchmarking!?; plotting; persistence; trend analysis!?; statistics!?.

Good monitoring library that is tailored towards load testing web applications.

Java Microbenchmark Harness (jmh)

parametrizable (custom invokers via Java API); Java library; JVM microbenchmarking; no plots; no persistence; no trend analysis; statistics.

The benchmarking harness built by Oracle's HotSpot experts, thus very suitable for microbenchmarking on HotSpot, used in OpenJDK performance work. Extreme measures are taken to provide the reliable benchmarking environment. Besides human-readable output, jmh provides a Java API to process the results, e.g. for 3rd party plotters and persistence providers.

junit-Benchmarks

parameterizable; Java library; JVM micro benchmarking; plotting; persistence (using CONSOLE, XML or database H2); graphical trend analysis; statistics (max, min, average, standard deviation; but not easily extensible for further statistics).

Simply add a junit-4-rule to your junit tests :)

junit-Benchmarks is open source, under the Apache 2 licence.

Update: project moved to jmh

junitperf

Mainly for doing trend analysis for performance (with the JUnit test decorator TimedTest) and scalability (with the JUnit test decorator LoadTest).

parameterizable; Java library; no JVM micro benchmarking; no plotting; no persistence; no statistics.

perf4j

not parameterizable; Java library; no JVM micro benchmarking; plotting; persistence via JMX; trend analysis via a log4j appender; statistics.

Builds upon a logging framework, can use AOP.

Project Broadway

Very general concept: monitors observe predefined conditions and specify how to react when they are met.

speedy-mcbenchmark

Main focus is on parameterizability: check whether your algorithm scales, i.e. check if it's O(n), O(n log(n)), O(n²)...

java library; JVM micro benchmarking; no plotting; persistence; trend analysis; no statistics.

The Grinder

parameterizable; Jython library; no JVM micro benchmarking; plotting; persistence; no trend analysis; no good statistics, but easily extensible.

Depends on Jython, HTTPClient, JEditSyntax, ApacheXMLBeans, PicoContainer.

TPTP

parameterizable!?; Java tool platform; no JVM micro benchmarking!?; plotting; persistence; graphical trend analysis; no statistics!?

The Test & Performance Tools Platform is a huge generic and extensible tool platform (based on Eclipse and four EMF models). Hence it is powerful but quite complex, can slow Eclipse down, and extending it for your own needs (e.g. with statistics so that they influence the number of iterations) seems to be very difficult.

Usemon

parameterizable!?; Java library; no JVM micro benchmarking; plotting; persistence; trend analysis!?; statistics!?.

Tool is tailored towards monitoring in large clusters.

It looks like junit-benchmarks [is actually open source, under the Apache 2 licence.](https://github.com/carrotsearch/junit-benchmarks/blob/master/junit-benchmarks.LICENSE). — z0r, Sep 10 '12 at 05:36
JMH: "It does, however, focus on the lower benchmarks, and follows a different paradigm than empiricism." -- What does that suppose to mean? — Aleksey Shipilev, Dec 26 '13 at 22:43
@Aleksey, I was looking at the jmh examples and got the expressions that measurements are only taken once. Thus I concluded it follows a different paradigm than the other tools that repeat some measurement and take the mean and other statistics. Can you provide a link for jmh's statistic capabilities? — DaveFar, Dec 27 '13 at 09:54
@Aleksey: Did I understand and revise your last sentence correctly? What did "to be consumed by processors" mean? — DaveFar, Dec 27 '13 at 10:02
http://hg.openjdk.java.net/code-tools/jmh/file/c2af91629c91/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_01_HelloWorld.java, "The contract for the benchmark methods is very simple: annotate it with @GenerateMicroBenchmark, and you are set to go. JMH will run the test **by continuously calling this method**, and measuring the performance metrics for its execution." — Aleksey Shipilev, Dec 27 '13 at 11:36
@DaveFar: http://hg.openjdk.java.net/code-tools/jmh/file/c2af91629c91/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_02_BenchmarkModes.java, run it, and observe the statistics capabilities. That is not to mention the raw results include the sampling data, letting users to digest it themselves. — Aleksey Shipilev, Dec 27 '13 at 11:37
@Aleksey: taking a closer look at the examples, jmh seems pretty awesome. Will definitely give it a try during my next JVM performance benchmarks. — DaveFar, Dec 27 '13 at 18:05
speedy-mcbenchmark sounds quite interesting to me from the site's description, but unfortunately, I do not see any code there (or anywhere else I could find) — bbarker, Mar 31 '18 at 11:58

score 9 · Answer 2 · edited Aug 17 '12 at 10:04

9

Another alternative is caliper from google. It allows parameterized testing.

edited Aug 17 '12 at 10:04

assylias

321,522
82
660
783

answered Aug 25 '11 at 06:22

sbridges

24,960
4
64
71

From `caliper` repo readme: `For new benchmarks, we recommend using a tool other than Caliper`: https://github.com/google/caliper#prefer-jmh-or-jetpack-microbenchmark – Marco Sulla Apr 18 '23 at 11:06

score 7 · Answer 3 · answered Aug 28 '11 at 12:48

7

Try using http://labs.carrotsearch.com/junit-benchmarks.html. This is an extention to JUni4, features:

Records execution time average and standard deviation.
Garbage collector activity recording.
Per-benchmark JVM warm-up phase.
Per-run and historical chart generation.
Optional results persistence in the H2 SQL database (advanced querying, historical analysis).

answered Aug 28 '11 at 12:48

Ula Krukar

12,549
20
51
65

A junit-rule for menchmarking that also computes the standard deviation - I love it (-> +1). Let's see whether it can do further statistics (e.g. decide on how many rounds to measure for a certain confidence) or be extended that way easily... – DaveFar Aug 29 '11 at 11:44
@Ula Krukar - The best thing about this framework is that it integrates seamlessly with existing JUnit tests. Caliper or JunitPerf do not which sorted them out for me. +1 – kostja Dec 23 '11 at 15:17
If only it had CSV output - I'd rather graph the results myself. I guess it wouldn't be hard to write one... – z0r Sep 10 '12 at 06:17
2

According to [junit-benchmarks](http://labs.carrotsearch.com/junit-benchmarks.html) home page: "The project has been deprecated in favor of [JMH](http://openjdk.java.net/projects/code-tools/jmh/)." – Marcin Jun 30 '15 at 19:25

What is the best macro-benchmarking tool / framework to measure a single-threaded complex algorithm in Java?

3 Answers3

Linked