2

As a follow-up to my previous question - Average time of operations stored in the database - I tried Strawberry's solution and it works great. However, I need to improve the functionality, so that it covers new requirements.

I have a table looking like this:

id | creation time       | operation_type | service_id
1  | 2017-01-03 11:14:25 | INSTALL_START  | 1
2  | 2017-01-03 11:14:26 | INSTALL_START  | 2
3  | 2017-01-03 11:14:28 | INSTALL_END    | 2
4  | 2017-01-03 11:14:30 | INSTALL_END    | 1
5  | 2017-01-03 11:14:35 | UNINSTALL_START| 1
6  | 2017-01-03 11:14:40 | UNINSTALL_END  | 1
7  | 2017-01-03 11:15:00 | INSTALL_START  | 1
8  | 2017-01-03 11:15:10 | INSTALL_END    | 1

As you can see each service can be installed and uninstalled several times. My goal is to write SQL query to determine the average time of service installation.

However, I'm struggling with pairing the INSTALL_START and INSTALL_END operations for each service, since there is no field like operation_id to distinct between the operations. If the service is installed, uninstalled and then installed again it's still the same service_id I'm operating on.

In this case we can see that first installation of service 1 took 5 seconds, installation of service 2 took 2 seconds, and second installation of service 1 took 10 seconds, so the the query should return 5,66.

Community
  • 1
  • 1
CorrieSparrow
  • 521
  • 1
  • 6
  • 23
  • This sounds like a gaps and islands problem (above my pay grade unfortunately). – Tim Biegeleisen Jan 05 '17 at 14:47
  • Jameson makes a good point in respect of the design. Consider splitting operation type into two columns - one which records the type of operation, and one which records whether it's a start or a stop. – Strawberry Jan 05 '17 at 16:13

1 Answers1

1
DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,creation_time DATETIME NOT NULL
,operation_type VARCHAR(20) NOT NULL
,phase VARCHAR(12) NOT NULL
,service_id INT NOT NULL
);

INSERT INTO my_table VALUES
(1,'2017-01-03 11:14:25','INSTALL','START',1),
(2,'2017-01-03 11:14:26','INSTALL','START',2),
(3,'2017-01-03 11:14:28','INSTALL','END',2),
(4,'2017-01-03 11:14:30','INSTALL','END',1),
(5,'2017-01-03 11:14:35','UNINSTALL','START',1),
(6,'2017-01-03 11:14:40','UNINSTALL','END',1),
(7,'2017-01-03 11:15:00','INSTALL','START',1),
(8,'2017-01-03 11:15:10','INSTALL','END',1);

SELECT AVG(TIME_TO_SEC(b.creation_time)-TIME_TO_SEC(a.creation_time)) avg_diff
  FROM
     ( SELECT x.*
            , MIN(y.id) y_id 
         FROM my_table x 
         JOIN my_table y 
           ON y.service_id = x.service_id 
          AND y.id > x.id 
          AND y.operation_type = x.operation_type 
          AND y.phase = 'end' 
        WHERE x.phase = 'start'
          AND x.operation_type = 'install'
        GROUP 
           BY x.id
     ) a
  JOIN my_table b
    ON b.id = a.y_id;

+----------+
| avg_diff |
+----------+
|   5.6667 |
+----------+
Strawberry
  • 33,750
  • 13
  • 40
  • 57
  • Nice, but do you think rewriting the OP's table is a good solution? And in that case, wouldn't it be better to add instance id (of installation) and be done with it? – Jameson the dog Jan 05 '17 at 17:03
  • @Jamesonthedog Thank you. Yes. And No. – Strawberry Jan 05 '17 at 17:39
  • Fair enough I guess... As long as you feel good with it... have a nice weekend! – Jameson the dog Jan 05 '17 at 17:40
  • @Jamesonthedog I'm being slightly facetious. Yes, of course it would be a good idea to have an installation id - but that might not be practical, so I think that this is a reasonable compromise. But worse, my weekend's still a whole day away :-( – Strawberry Jan 05 '17 at 17:43
  • Think about it like this - my week started a day earlier! as to the question, I've always tried to answer inside the given parameters - maybe you can split the operation_type and use it like that? Slightly more complex query but matches the given structure – Jameson the dog Jan 05 '17 at 17:56
  • I edited it a little bit to reflect my schema. Instead of ``AND y.id > x.id AND y.operation_type = x.operation_type AND y.phase = 'end' WHERE x.phase = 'start' AND x.operation_type = 'install'`` I now have ``AND y.id > x.id AND x.operation_type = 'INSTALL_START' WHERE y.operation_type = 'INSTALL_END'`` and it seems to work. Didn't I miss anything? – CorrieSparrow Jan 08 '17 at 21:38
  • @CorrieSparrow It's just not a very scalable solution, but I *think* it's fine. – Strawberry Jan 10 '17 at 09:58