29

I'm using an optimised version of Levenshtein's algorithm in some search code I'm building. I have functional unit tests to verify that the algorithm is returning the correct results, but in this context the performance of the algorithm is also hugely important.

I'm looking to add some test coverage to the project so that if any future modifications affect the optimisations, they'll show up as failing tests - because the algorithm is deterministic and running against known test data, this could be as detailed as counting the number of instructions executed for a given set of test inputs. In other words, I'm not looking to measure algorithm performance using timers - I'm interested in actually testing the algorithm's internal behaviour instead of just the output.

Any ideas how I would approach this in C#/.NET 4?

EDIT: The reason I don't want to just use wall-clock time is that it'll vary with CPU load and other factors outside the control of the test. That could lead to tests that fail when the build server is under load, for example. There will be wall-clock monitoring as part of the deployed system.

EDIT 2: Think of it this way... how would you apply red->green->refactor when performance is a critical requirement?

Dylan Beattie
  • 53,688
  • 35
  • 128
  • 197
  • 1
    Number of instructions executed doesn't necessarily equal performance. Why isn't wall-clock time the metric of interest here? – Oliver Charlesworth Mar 03 '13 at 01:13
  • Agreed - but if the number of instructions executed in response to a specific set of inputs was to change dramatically as a result of a modification to the code, I'd like to know about it. That's the sort of thing I'm looking for. – Dylan Beattie Mar 03 '13 at 01:14
  • 2
    I wouldn't use unit tests to verify that. If devs are changing working, complete, highly optimised code then they should know exactly what they are doing. It's not that likely that this code will ever need to be 'tweaked' right? So why bother? Performance is a non-functional characteristic. – Mitch Wheat Mar 03 '13 at 01:14
  • 1
    Run some timers in a unit test. Fail if they're an order of magnitude larger than they should be. They should still pass even under CPU load; if they're failing due to server load, put them on a server that's less loaded. – Robert Harvey Mar 03 '13 at 01:21
  • 1
    @RobertHarvey: A unit test that may randomly pass or fail based on external factors is arguably not a proper unit test... – Oliver Charlesworth Mar 03 '13 at 01:46
  • @DylanBeattie: I see your point about CPU load, etc. But instruction count (or something equivalent) could vary considerably without any effect on runtime, so whilst it may be deterministic, it's probably not a great metric for test success/failure. – Oliver Charlesworth Mar 03 '13 at 01:50
  • 1
    @OliCharlesworth: Well, if it fails, you know there's a problem. – Robert Harvey Mar 03 '13 at 03:54

1 Answers1

36

I'm going to answer the third part of your question, since I've done this with some success several times.

how would you apply red->green->refactor when performance is a critical requirement?

  1. Write pinning tests to catch regressions, for what you plan to change and other methods that may slow down as a result of your changes.
  2. Write a performance test that fails.
  3. Make performance improvements, running all tests frequently.
  4. Update your pinning tests to more closely pin the performance.

Write pinning tests

Create a helper method like this to time what you want to pin.

private TimeSpan Time(Action toTime)
{
    var timer = Stopwatch.StartNew();
    toTime();
    timer.Stop();
    return timer.Elapsed;
}

Then write a test that asserts your method takes no time:

[Test]
public void FooPerformance_Pin()
{
    Assert.That(Time(()=>fooer.Foo()), Is.LessThanOrEqualTo(TimeSpan.FromSeconds(0));
}

When it fails (with the actual time elapsed in the failure message), update the time with something slightly more than the actual time. Rerun and it will pass. Repeat this for other functions whose performance you might impact with your changes, ending up with something like this.

[Test]
public void FooPerformance_Pin()
{
    Assert.That(Time(()=>fooer.Foo()), Is.LessThanOrEqualTo(TimeSpan.FromSeconds(0.8));
}
[Test]
public void BarPerformance_Pin()
{
    Assert.That(Time(()=>fooer.Bar()), Is.LessThanOrEqualTo(TimeSpan.FromSeconds(6));
}

Write a failing performance test

I like to call this kind of test a "baiting test". It's just the first step of a pinning test.

[Test]
public void FooPerformance_Bait()
{
    Assert.That(Time(()=>fooer.Foo()), Is.LessThanOrEqualTo(TimeSpan.FromSeconds(0));
}

Now, work on performance improvements. Run all the tests (pinning and baiting) after each tentative improvement. If you are successful, you'll see the time going down in the failure output of the baiting test, and none of your pinning tests will fail.

When you are satisfied with the improvements, update the pinning test for the code you changed, and delete the baiting test.

What do you do with these tests now?

The least worrisome thing to do is to mark these tests with the Explicit attribute, and keep them around for the next time you want to check performance.

On the opposite side of the work spectrum, creating a reasonably well controlled subsystem in CI for running these kind of tests is a really good way to monitor performance regressions. In my experience there is a lot of worry about them "failing randomly due to CPU load from something else" than there are actual failures. The success of this kind of effort depends more on team culture than your ability to exercise control over the environment.

Douglas Waugh
  • 318
  • 2
  • 7
tallseth
  • 3,635
  • 1
  • 23
  • 24
  • 1
    I'm not sure I understand your final comment. Surely it's **everything** to do with having a well-controlled CI that ensures that performance tests run on a specific machine, one at a time, with as many background processes, etc. disabled as possible? And perhaps also ensuring that e.g. caches are flushed before the run starts. If you don't do most/all of the above, I don't see how you can do meaningful performance regression (unless you set a very loose threshold). – Oliver Charlesworth Mar 03 '13 at 10:56
  • I think you're probably right - particularly concerning the need for a well-controlled CI subsystem for monitoring performance regressions. I'm thinking a dedicated TeamCity build agent is probably a good approach here - we've done similar things for stuff like long-running Selenium tests before. Thanks for such a comprehensive answer. – Dylan Beattie Mar 03 '13 at 11:14
  • @OliCharlesworth, have you tried having performance tests like this running in CI? There was a **lot** of pushback the first time I did it. The fact was that while it did take work to set up, it was not a lot of work to get them stable. It was a lot more work to argue with naysayers before we tried it. YMMV, most of these tests measure something on the order of seconds, not milliseconds. Very fast perf tests will be more fragile to external conditions, I expect. – tallseth Mar 04 '13 at 00:24
  • 1
    There are some good ideas in this answer too: http://stackoverflow.com/a/16157458/64105 – asgerhallas Jun 07 '13 at 18:44