Accurate gauging performance tests across environments

Question

I use Java in this question but this really applies to all modern app development. Our "environment pipeline", like many of them, looks like this:

Developer sandbox
Continuous integration & testing
QA/Staging
Production

The hardware, available RAM & CPU in each of these environments is different: my laptop is a 2GB dual-core Windows machine. Testing runs on a 4GB machine. Production is two (load-balanced) 8GB, quad-core servers.

Obviously the same code will perform differently when it runs on these different machines (environments).

I was thinking about writing automated performance tests for some of my classes that would be of the form:

private static final long MAX_TIME = 8000;

@Test
public final void perfTestSomething() {
    long start = System.currentTimeInMillis();

    // Run the test

    long end = System.currentTimeInMillis();

    assertTrue((end - start) < MAX_TIME);
}

Thus the automated performance test fails if the test takes more than, say, 8 seconds to run.

But then this realization dawned on me: the code will run different in different environments, and will run differently depending on the state of JVM and GC. I could run the same test 1000 times on my own machine and have wildly different results.

So I ask: how does one accurately/reliably define & gauge automated performance tests as code is promoted from one environment to the next?

Thanks in advance!

score 1 · Answer 1 · edited May 23 '17 at 11:51

I could run the same test 1000 times on my own machine and have wildly different results.

Actually, that's unlikely. There will of course be some variability, but if the machine isn't being heavy loaded by other tasks, the majority of the 1000 timings will be fairly close together.

One way to get some meaningful -- and stable -- numbers is to run the test many times, and then look at certain percentiles of the timings (e.g. the median, the 90th percentile, the 99th etc).

There are additional complications that arise if the unit of your testing is smaller than a single invocation of the JVM (say, you're testing a single method or a group of related methods). If that's the case, I strongly recommend reading How do I write a correct micro-benchmark in Java?

Excellent advice - thanks aix, I upvoted your answer but had to give the green check to rfreak because it was just a *teeny* bit more applicable to my situation. Thank you infinitely, though. — IAmYourFaja, Feb 02 '12 at 22:00

score 1 · Accepted Answer · answered Feb 02 '12 at 17:20

It may be that you only want to run the performance tests in a given location that is more tightly controlled. You don't necessarily need to run them in all environments, there's little benefit in that. You should run them in an environment that most closely mimics a production configuration (that's what you REALLY care about, right?).

Also, make sure you give yourself reasonable overhead in your performance restrictions. Don't lock them down to just above what your server does NOW. Select some reasonable thresholds to account for some variation in the current run.

Long term what I've found more useful is a graph over time of the performance numbers. Not a hard limit. That way we can watch trending of various functionality over time, and attack it when it trends too high.

Accurate gauging performance tests across environments

2 Answers2