Detecting and pinpointing performance regressions

Question

Are there any known techniques (and resources related to them, like research papers or blog entries) which describe how do dynamically programatically detect the part of the code that caused a performance regression, and if possible, on the JVM or some other virtual machine environment (where techniques such as instrumentation can be applied relatively easy)?

In particular, when having a large codebase and a bigger number of committers to a project (like, for example, an OS, language or some framework), it is sometimes hard to find out the change that caused a performance regression. A paper such as this one goes a long way in describing how to detect performance regressions (e.g. in a certain snippet of code), but not how to dynamically find the piece of the code in the project that got changed by some commit and caused the performance regression.

I was thinking that this might be done by instrumenting pieces of the program to detect the exact method which causes the regression, or at least narrowing the range of possible causes of the performance regression.

Does anyone know about anything written about this, or any project using such performance regression detection techniques?

EDIT:

I was referring to something along these lines, but doing further analysis into the codebase itself.

Neville Kuyt · Answer 1 · 2013-11-06T10:38:09.063

Perhaps not entirely what you are asking, but on a project I've worked on with extreme performance requirements, we wrote performance tests using our unit testing framework, and glued them into our continuous integration environment.

This meant that every check-in, our CI server would run tests that validated we hadn't slowed down the functionality beyond our acceptable boundaries.

It wasn't perfect - but it did allow us to keep an eye on our key performance statistics over time, and it caught check-ins that affected the performance.

Defining "acceptable boundaries" for performance is more an art than a science - in our CI-driven tests, we took a fairly simple approach, based on the hardware specification; we would fail the build if the performance tests exceeded a response time of more than 1 second with 100 concurrent users. This caught a bunch of lowhanging fruit performance issues, and gave us a decent level of confidence on "production" hardware.

We explicitly didn't run these tests before check-in, as that would slow down the development cycle - forcing a developer to run through fairly long-running tests before checking in encourages them not to check in too often. We also weren't confident we'd get meaningful results without deploying to known hardware.

How was the acceptable range defined? Could this be extended to catch regressions *before* checkin? — bukzor, Nov 05 '13 at 18:38
I suppose that given a DVCS and some kind of CI system, you could run these performance checks on development branches before they were eligible for merge. This could be implemented on github+travis for little open-source projects. I'd still like to bring more science to setting bounds and flagging errors. — bukzor, Nov 06 '13 at 19:03

score 1 · Answer 2 · answered Oct 03 '11 at 19:48

1

With tools like YourKit you can take a snapshot of the performance breakdown of a test or application. If you run the application again, you can compare performance breakdowns to find differences.

Performance profiling is more of an art than a science. I don't believe you will find a tool which tells you exactly what the problem is, you have to use your judgement.

For example, say you have a method which is taking much longer than it used to do. Is it because the method has changed or because it is being called a different way, or much more often. You have to use some judgement of your own.

answered Oct 03 '11 at 19:48

Peter Lawrey

525,659
79
751
1,130

Thanks! I agree - it's not easy to analyze exactly why the method is being run much more often. I was wondering if there are heuristics which give at least an approximate answer. – axel22 Oct 03 '11 at 19:56
You can use the report to compare the busiest methods before and after a change. It can be worth examining one of the methods which increases. – Peter Lawrey Oct 03 '11 at 20:09

score 1 · Answer 3 · answered Oct 03 '11 at 19:50

1

JProfiler allows you to see list of instrumented methods which you can sort by average execution time, inherent time, number of invocations etc. I think if this information is saved over releases one can get some insight into regression. Offcourse the profiling data will not be accurate if the tests are not exactly same.

answered Oct 03 '11 at 19:50

Ashwinee K Jha

9,187
2
25
19

Thank you. Interesting - is there a library support also, or a writeup about how JProfiler does this? – axel22 Oct 03 '11 at 19:57
The "one can get some insight" part is the hard part, in my mind. People don't tend to dredge these data until there's been an obvious regression, which makes me want to automate it. – bukzor Nov 05 '13 at 18:39

score 1 · Answer 4 · edited May 23 '17 at 10:34

Some people are aware of a technique for finding (as opposed to measuring) the cause of excess time being taken.

It's simple, but it's very effective.

Essentially it is this:

If the code is slow it's because it's spending some fraction F (like 20%, 50%, or 90%) of its time doing something X unnecessary, in the sense that if you knew what it was, you'd blow it away, and save that fraction of time.

During the general time it's being slow, at any random nanosecond the probability that it's doing X is F.

So just drop in on it, a few times, and ask it what it's doing. And ask it why it's doing it.

Typical apps are spending nearly all their time either waiting for some I/O to complete, or some library function to return.

If there is something in your program taking too much time (and there is), it is almost certainly one or a few function calls, that you will find on the call stack, being done for lousy reasons.

Here's more on that subject.

Detecting and pinpointing performance regressions

4 Answers4