10

In our logfiles we store response times for the requests. What's the most efficient way to calculate the median response time, the "75/90/95% of requests were served in less than N time" numbers etc? (I guess a variation of my question is: What's the best way to calculate the median and standard deviation of a bunch stream of numbers).

The best I came up with was just reading all the numbers, ordering them and then picking out the numbers, but that seems really goofy. Isn't there a smarter way?

We use Perl, but solutions for any language might be helpful.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Ask Bjørn Hansen
  • 6,784
  • 2
  • 26
  • 40
  • Show a sample of your logfile – xxxxxxx Sep 29 '09 at 07:56
  • hi spx2 - our logs are just line-terminated JSON structures, where one of the elements is a list of various time counters (actual time, cpu time, etc). I don't think it's too interesting; we'll do a map-reduce type thing to pull out list of response times (by page type, etc). – Ask Bjørn Hansen Sep 29 '09 at 08:07
  • I would have thought with your magic that 110% of the requests were served before they even left the requestor. :) – brian d foy Sep 29 '09 at 23:57

0 Answers0