0

I have a table called lead_history and when I am calculating averages from it the query is taking over a minute and a half. The query is parsing through roughly 15k rows. I have added indexes based on the below query but it still seems to take a really long time. Any help would be much appreciated.

 SELECT AVG(TIMEDIFF(tq.ts, tv.ts)) / 60 avg_time_to_quote 
   FROM lead_history tv 
   JOIN lead_history tq 
     ON tv.agency_id = tq.agency_id  
  WHERE tv.new_status = 'Verified' 
    AND tq.new_status = 'Quoted' 
    AND tv.agency_id = '$agency_id' 
    AND tv.ts > DATE_SUB(NOW(), INTERVAL 30 DAY) 
    AND tq.ts > DATE_SUB(NOW(), INTERVAL 30 DAY) 
  GROUP 
     BY tv.agency_id
      , tq.agency_id

Table Structure

show create table lead_history;
+--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table        | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
+--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| lead_history | CREATE TABLE `lead_history` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `agency_id` varchar(255) NOT NULL,
  `old_status` varchar(64) DEFAULT NULL,
  `new_status` varchar(64) DEFAULT NULL,
  `ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `contact_id` varchar(255) NOT NULL DEFAULT '',
  `alter_type` varchar(255) NOT NULL DEFAULT '',
  `last_mod_by` varchar(64) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `avg_index` (`old_status`,`new_status`,`agency_id`,`ts`)
) ENGINE=InnoDB AUTO_INCREMENT=14041 DEFAULT CHARSET=latin1 |
+--------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Indexes

mysql> show indexes from lead_history;

+--------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table        | Non_unique | Key_name  | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| lead_history |          0 | PRIMARY   |            1 | id          | A         |       13922 |     NULL | NULL   |      | BTREE      |         |               |
| lead_history |          1 | avg_index |            1 | old_status  | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |
| lead_history |          1 | avg_index |            2 | new_status  | A         |          46 |     NULL | NULL   | YES  | BTREE      |         |               |
| lead_history |          1 | avg_index |            3 | agency_id   | A         |          48 |     NULL | NULL   |      | BTREE      |         |               |
| lead_history |          1 | avg_index |            4 | ts          | A         |        1330 |     NULL | NULL   |      | BTREE      |         |               |
+--------------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+`

Explain `

mysql> explain select avg(UNIX_TIMESTAMP(tq.ts) - UNIX_TIMESTAMP(tv.ts)) / 60 as avg_time_to_quote from lead_history tv join lead_history tq on tv.agency_id = tq.agency_id  WHERE tv.old_status not like 'Verified' and tq.new_status = 'Verified' and tv.agency_id and tv.agency_id = 'XXXXXXXXXXXX35';
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+-------+----------+-----------------------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key       | key_len | ref  | rows  | filtered | Extra                                                           |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+-------+----------+-----------------------------------------------------------------+
|  1 | SIMPLE      | tq    | NULL       | index | NULL          | avg_index | 395     | NULL | 13922 |     1.00 | Using where; Using index                                        |
|  1 | SIMPLE      | tv    | NULL       | index | NULL          | avg_index | 395     | NULL | 13922 |     8.00 | Using where; Using index; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+-------+---------------+-----------+---------+------+-------+----------+-----------------------------------------------------------------+
2 rows in set, 1 warning (0.02 sec)

mysql>

`

  • 1
    See the answer for this Q: http://stackoverflow.com/questions/3528219/mysql-how-to-get-the-difference-between-two-timestamps-in-seconds It says substracting `UNIX_TIMESTAMP(a)-UNIX_TIMESTAMP(b)` can be faster than using TIMEDIFF on TIMESTAMP fields. – Todor Simeonov May 20 '17 at 12:26
  • WOW! I can't believe how much that made a difference. Cut query by a minute. 30 seconds still seems like quite a long time though. Any other thoughts? – James Buchert May 20 '17 at 12:36
  • You have DATE_SUB() functions with intervals. How about substracting from UNIX_TIMESTAMP(TIMESTAMP) an integer = 30 days * 24 hours * 3600 secs ? – Todor Simeonov May 20 '17 at 12:40
  • 1
    Incidentally, I fine SHOW CREATE TABLE much easier to read than DESCRIBE – Strawberry May 20 '17 at 12:43
  • @Strawberry That was show create table. **NOTE** - I see what you mean, that was definitely not show create table, sorry for the mix up, changed it. – James Buchert May 20 '17 at 12:47
  • @JamesBuchert about my last comment for DATE_SUB: Before this big query you can make 1 simple just to get UNIX_TIMESTAMP(NOW()) as integer in your code. The reason to call SQL's NOW(), rather than your server's now() is to get the value with the correct time zone defined in DB. Then substract from it (30*24*3600) and place the resulting integer for comparison, beacause you use it twice. – Todor Simeonov May 20 '17 at 12:51
  • @TodorSimeonov okay, will give that a try now. I am assuming I have to convert it back to a mysql datetime before I do the tv.ts > CONVERTED_DATE_FROM_UNIX_TIMESTAMP? – James Buchert May 20 '17 at 12:53
  • is using a join faster than just checking both the agency ids in the where condition against the constant? – inarilo May 20 '17 at 12:55
  • @JamesBuchert : `AND UNIX_TIMESTAMP(tv.ts) > MyPreCalculatedInt AND UNIX_TIMESTAMP(tq.ts) > MyPreCalculatedInt` - just in the way you replaced TIMEDIFF. – Todor Simeonov May 20 '17 at 12:58
  • @TodorSimeonov Would this work as well? `select avg(UNIX_TIMESTAMP(tq.ts) - UNIX_TIMESTAMP(tv.ts)) / 60 as avg_time_to_quote from lead_history tv join lead_history tq on tv.agency_id = tq.agency_id WHERE tv.new_status = 'Verified' and tq.new_status = 'Quoted' and tv.agency_id = 'XXXXXXXXXXX35' and tv.ts > '2017-04-20 09:03:47' and tq.ts > '2017-04-20 09:03:47' group by tv.agency_id,tq.agency_id;` – James Buchert May 20 '17 at 13:05
  • So just for fun, what happens if you rearrange the index: `agency_id`,`ts`, `new_status`, `old_status` - remember to prevent caching – Strawberry May 20 '17 at 13:06
  • @JamesBuchert It could be slower, it could be not. I'm not sure. It depends on how smart is SQL. Maybe on the first parse the system will see that the date is string constant, parse it once to a number and then internaly use it as a number. Not sure. It will be nice if you try both - string date constant and numeric (UNIX_TIMESTAMP) date constant and compare the speed. – Todor Simeonov May 20 '17 at 13:08
  • @Strawberry no change in query time. – James Buchert May 20 '17 at 13:11
  • @JamesBuchert Next thing you have to consider is can you change columns `old_status` and `new_status` to integers. Comapring these string constants is also a slow job. – Todor Simeonov May 20 '17 at 13:16
  • @TodorSimeonov String Constant - 30 seconds Unix Timestamp with MYSQL doing calculation of timestamp - +5 seconds longer Unix Timestamp with PHP doing calculation of timestamp - +30 seconds longer – James Buchert May 20 '17 at 13:17
  • @TodorSimeonov that is an option but will take a while to redo a few tables, add a reference table for status types etc. I will look into it though. – James Buchert May 20 '17 at 13:18
  • @inarilo what do you mean? sorry still learning mysql.... – James Buchert May 20 '17 at 13:26
  • I mean is it any faster if you use `SELECT... FROM lead_history tv, lead_history tq WHERE ... AND tv.agency_id = '$agency_id' AND tq.agency_id = '$agency_id' ... GROUP ...` – inarilo May 20 '17 at 13:35

1 Answers1

1
  • The indexes you have are useless because of the order -- add INDEX(agency, new_status, ts)

  • Doing the GROUP BY... Probably a good idea, but this answer is not pursuing that.

  • Change "status" columns from bulky VARCHAR to a 1-byte ENUM.

Consider a difference approach: Two tables -- one with an audit trail (or transaction history); one with the current status. With that, there would not need to be a JOIN to do the diff. It would, on the other hand, require either

  • INSERT into the history table plus a UPDATE the status table; or
  • a TRIGGER on the history table to update the status table.
Rick James
  • 135,179
  • 13
  • 127
  • 222