1

I am searching for good (preferably plug-and-play) solutions for performing diagnostics on software I am developing. The software I am working on has several components that require extensive computing resources, and so we're attempting to capture the performance of these components for two reasons: 1) estimate required computing resources and thus the costs of running the software, and 2) quantify what an "improvement" is for the component (i.e. if we modify the code and speed increases, then it's an improvement). Our application is composed of a search engine plus many other components, and understanding the speed of the search engine is also critical to the end-user.

It seems to be hard to search for a solution since I'm not sure how to properly define my problem. But what I've found so far seems to be basic error logging techniques. A solution whose purpose is to run statistics (e.g. statistical regressions) off of the data would be best. Maybe unit testing frameworks have built-in test timers, but we need to capture data from live runs of our application to account for the numerous different scenarios.

So really there are two questions:

1) Is there a predefined solution for these sorts of tests?

2) Is there any good reference for running statistical regressions on this kind of data? Let's say we captured execution time of the script and size of the input data (e.g. query). We can regress time on data size to understand the effect of changing the data size on the execution time. But these sorts of regressions are tricky since it's not clear what all of the relevant variables are. Any reference to analyzing performance data would be excellent, and benefit to many people I believe!

Thanks Matt

Matt De Leon
  • 747
  • 1
  • 7
  • 15

1 Answers1

0

Big apps like these are going to be doing a lot of non-CPU processing, so to find optimization points you're going to need wall-clock-based, not CPU-based, sampling.

gprof and some others only sample on CPU time, so they cannot see needless I/O or other system calls. If you do manage to find and remove CPU-intensive performance problems, the I/O-intensive ones will only become a larger fraction of the time.

Take a look at Zoom. It's a stack sampler that reports, by line of code, the percent of wall-clock time that line is on the stack. Any code point worth optimizing will probably be such a line. It also has a nice butterfly view for browsing the call graph. (You don't want the call graph as a whole. It will be a meaningless rat's nest.)

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • Thanks, this is an excellent start. I am looking for good performance metrics for php and java mainly, I'm not sure Zoom or gprof will work for those two. Nevertheless reading the docs for those have given me some great info. – Matt De Leon May 22 '11 at 23:16
  • @Matt: You're welcome. When the goal is to make code run faster (as opposed to just measuring how fast it is) [random pausing](http://stackoverflow.com/questions/375913/what-can-i-use-to-profile-c-code-in-linux/378024#378024) is a very effective language-independent technique. – Mike Dunlavey May 23 '11 at 02:19
  • Mike, any idea if these or other profilers can capture custom variables? For example, we need to log amount of data being processed, type of server (since we'll be experimenting with different servers), etc. Thanks! – Matt De Leon May 26 '11 at 14:38
  • @Matt: I often have to say that measuring and diagnosing are different tasks. For measuring, logging and/or wall-clock (not CPU) profilers are probably what you need (instrumenting type, most likely). For diagnosis, a wall-clock stack sampler that reports % by line is a much better tool. A simple example is a loop that runs much longer than necessary. A single stack sample won't measure it, but it *will* find it. – Mike Dunlavey May 26 '11 at 14:59