11

I have unit tests. If one of them fails, my build fails.

I would like to apply the same principle to performance. I have a series of microbenchmarks for several hot paths through a library. Empirically, slowdowns in these areas have a disproportionate effect on the library's overall performance.

It would be nice if there were some way to have some concept of a "performance build" that can fail in the event of a too-significant performance regression.

I had considered hard-coding thresholds that must not be exceeded. Something like:

Assert.IsTrue(hotPathTestResult.TotalTime <= threshold)

but pegging that to an absolute value is hardware and environment-dependent, and therefore brittle.

Has anyone implemented something like this? What does Microsoft do for Kestrel?

rianjs
  • 7,767
  • 5
  • 24
  • 40
  • Are you currently asking performance questions of a single user execution for response time and resource utilization ( CPU, Disk, RAM, NET | how early/often/long and size of working set ) ? If so, then you can set your baseline for expected result to your prior results and observe any deltas in time or resources ( which drive performance and the ability to scale ) – James Pulley May 31 '18 at 13:24

1 Answers1

9

I would not do this via unit-tests -- it's the wrong place. Do this in a build/test-script. You gain more flexibility and can do a lot of more things that may be necessary.

A rough outline would be:

  1. build
  2. run unit tests
  3. run integration tests
  4. run benchmarks
  5. upload benchmark results to results-store (commercial product e.g. "PowerBI")
  6. check current results with previous results
  7. upload artefacts / deploy packages

On 6. if there is a regression you can let the build fail with non-zero exit-code.
BenchmarkDotNet can export results as JSON, etc., so you can take advantage of that.

The point is how to determine if a regression occures. Espcecially on CI builds (with containers, and that like) there may be different hardware on different benchmark-runs, so the results are not 1:1 comparable, and you have to take this into account.
Personally I don't let the script fail in case of a possible regression, but it sends an information about that, so I can manually check if it's a true regression or just a cause by different hardware.

Regression is simply detected if the current results are worse than the median of the last 5 results. Of course this is a rough method, but an effective one and you can tune that to your needs.

gfoidl
  • 890
  • 10
  • 8