1

I have a log of events for when a device either starts or stops with a failure code and I'm trying to calculate the average and mean times between a failure and a start. Here's a very simple example table of data:

+----+-----------+---------------------+
| id | eventName | eventTime           |
+----+-----------+---------------------+
|  1 | start     | 2012-11-01 14:25:20 |
|  2 | fail A    | 2012-11-01 14:27:45 |
|  3 | start     | 2012-11-01 14:30:49 |
|  4 | fail B    | 2012-11-01 14:32:54 |
|  5 | start     | 2012-11-01 14:35:59 |
|  6 | fail A    | 2012-11-01 14:37:02 |
|  7 | start     | 2012-11-01 14:38:05 |
|  8 | fail A    | 2012-11-01 14:40:09 |
|  9 | start     | 2012-11-01 14:41:11 |
| 10 | fail C    | 2012-11-01 14:43:14 |
+----+-----------+---------------------+

Create code:

CREATE TABLE `test` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `eventName` varchar(50) NOT NULL,
  `eventTime` datetime NOT NULL,
  PRIMARY KEY (`id`)
);
INSERT INTO `test` (`id`, `eventName`, `eventTime`) VALUES (1,'start','2012-11-01 14:25:20'),(2,'fail A','2012-11-01 14:27:45'),(3,'start','2012-11-01 14:30:49'),(4,'fail B','2012-11-01 14:32:54'),(5,'start','2012-11-01 14:35:59'),(6,'fail A','2012-11-01 14:37:02'),(7,'start','2012-11-01 14:38:05'),(8,'fail A','2012-11-01 14:40:09'),(9,'start','2012-11-01 14:41:11'),(10,'fail C','2012-11-01 14:43:14');

I can get the times between a start and a failure using something like this:

SET @time_prev := -1;
SELECT
 *
FROM
(
  SELECT
    eventName
  , eventTime
  , @ts := UNIX_TIMESTAMP(eventTime) AS ts
  , @started := IF(eventName = 'start', 1, 0) AS started
  , @failed := IF(eventName <> 'start', 1, 0) AS failed
  , @time_diff := IF(@time_prev > -1, @ts - @time_prev, 0) AS time_diff
  , @time_prev := @ts AS time_prev
  , @time_to_fail := IF(@failed, @time_diff, 0) AS time_to_fail
  , @time_to_start := IF(@started, @time_diff, 0) AS time_to_start
  FROM
    test
) AS t1;

+-----------+---------------------+------------+---------+--------+-----------+------------+--------------+---------------+
| eventName | eventTime           | ts         | started | failed | time_diff | time_prev  | time_to_fail | time_to_start |
+-----------+---------------------+------------+---------+--------+-----------+------------+--------------+---------------+
| start     | 2012-11-01 14:25:20 | 1351805120 |       1 |      0 |         0 | 1351805120 | 0            | 0             |
| fail A    | 2012-11-01 14:27:45 | 1351805265 |       0 |      1 |       145 | 1351805265 | 0            | 145           |
| start     | 2012-11-01 14:30:49 | 1351805449 |       1 |      0 |       184 | 1351805449 | 184          | 0             |
| fail B    | 2012-11-01 14:32:54 | 1351805574 |       0 |      1 |       125 | 1351805574 | 0            | 125           |
| start     | 2012-11-01 14:35:59 | 1351805759 |       1 |      0 |       185 | 1351805759 | 185          | 0             |
| fail A    | 2012-11-01 14:37:02 | 1351805822 |       0 |      1 |        63 | 1351805822 | 0            | 63            |
| start     | 2012-11-01 14:38:05 | 1351805885 |       1 |      0 |        63 | 1351805885 | 63           | 0             |
| fail A    | 2012-11-01 14:40:09 | 1351806009 |       0 |      1 |       124 | 1351806009 | 0            | 124           |
| start     | 2012-11-01 14:41:11 | 1351806071 |       1 |      0 |        62 | 1351806071 | 62           | 0             |
| fail C    | 2012-11-01 14:43:14 | 1351806194 |       0 |      1 |       123 | 1351806194 | 0            | 123           |
+-----------+---------------------+------------+---------+--------+-----------+------------+--------------+---------------+

But in order to get time between failure and start I have to move ahead to the next record and lose the grouping of that failure code. How can I move this to the next level and get the future to time start merged into a failed record so it can be grouped?

Ultimately, after calculating averages and medians I would end up with a result set like this:

+-----------+-------------+----------------+--------------+-----------------+
| eventName | avg_to_fail | median_to_fail | avg_to_start | median_to_start |
+-----------+-------------+----------------+--------------+-----------------+
|    fail A |      110.66 |         124.00 |       103.00 |           63.00 |
|    fail B |      125.00 |         125.00 |       185.00 |          185.00 |
+-----------+-------------+----------------+--------------+-----------------+
fwrawx
  • 171
  • 1
  • 10

1 Answers1

1

This gives the averages Medians are a pain in SQL. Simple way to calculate median with MySQL gives some ideas. The two inner queries give the result sets to median over were there a median aggregate.

Select
  times.eventName,
  avg(times.timelapse) as avg_to_fail,
  avg(times2.timelapse) as avg_to_start
From (
  Select
    starts.id,
    starts.eventName,
    TimestampDiff(SECOND, starts.eventTime, Min(ends.eventTime)) as timelapse
  From
    Test as starts,
    Test as ends
  Where
    starts.eventName != 'start' And
    ends.eventName = 'start' And
    ends.eventTime > starts.eventTime
  Group By
    starts.id
) as times2
  Right Outer Join (
  Select
    starts.id,
    ends.eventName,
    TimestampDiff(SECOND, starts.eventTime, Min(ends.eventTime)) as timelapse
  From
    Test as starts,
    Test as ends
  Where
    starts.eventName = 'start' And
    ends.eventName != 'start' And
    ends.eventTime > starts.eventTime
  Group By
    starts.id
) as times
  On times2.EventName = times.EventName
Group By
  Times.eventName

To aid understanding I'd first consider

Select
  starts.id,
  ends.eventName,
  starts.eventTime, 
  ends.eventTime
From
  Test as starts,
  Test as ends
Where
  starts.eventName = 'start' And
  ends.eventName != 'start' And
  ends.eventTime > starts.eventTime

This is the essence of the inner query times without the group by and the min statement. You'll see this has a row combining every start event with every end event where the end event is after the start event. Call this X.

The next part is

Select
  X.startid,
  X.endeventname,
  TimestampDiff(SECOND, X.starttime, Min(x.endTime)) as timelapse
From
  X
Group By
  X.startid

The key here is that Min(x.endTime) combines with the group by. So we're getting the earliest end time after the start time (as X already constrained it to be after). Although I've only picked out the columns we need to use, we have access to start time id, end time id start event, end event, start time, min(end time) here. The reason you can adapt it to find the avg_to_start is because we pick the interesting event name, as we have both.

SQL Fiddle: http://sqlfiddle.com/#!2/90465/6

Community
  • 1
  • 1
Laurence
  • 10,896
  • 1
  • 25
  • 34
  • I removed median from the title, it's not the problem. The problem is calculating the second avg/median based on the next row of data. – fwrawx Nov 01 '12 at 22:57
  • @fwrawx - I've updated it to give avg_to_fail as per your specification. It's pretty easy to adapt to give avg_to_start. You could then full outer join the two result sets on EventName. – Laurence Nov 01 '12 at 22:59
  • *_to_fail is the easy part, getting *_to_start and merging is the hard part because 1) the eventName is the same for all the records and 2) the time is calculated from the previous record – fwrawx Nov 01 '12 at 23:32
  • @fwrawx - I've updated it to give to_start as well. It's the same pattern. Some of the table aliases may be confusing now... – Laurence Nov 01 '12 at 23:40
  • Well that certainly produces the desired results with the example data. Any chance of explaining how this works so I can understand how to adapt the concept to my real data? – fwrawx Nov 02 '12 at 15:24
  • I've put some more detailed explanation in. – Laurence Nov 02 '12 at 15:52
  • Excellent! In reality my data and requirements are a bit more complex. I hate to comment spam and and can't chat so I posted another example with one slight problem: [link](http://sqlfiddle.com/#!2/7f1e7/3) – fwrawx Nov 02 '12 at 17:48
  • You could basically run the query I wrote on the result set in your question. If you can filter down the extended table in the same way, it should be adaptable. – Laurence Nov 02 '12 at 19:18