1

I have this working SQL query but it almost makes my DB crash:

SELECT MASTER.master_id, 
       MASTER.master_summary, 
       MASTER.master_start, 
       MASTER.master_end, 
       MASTER.master_risk, 
       MASTER.master_source, 
       MASTER.master_veto, 
       master.master_tags, 
       NULL AS HAS_CE, 
       C2C.c2c_customer 
FROM      `cer_master` MASTER 
LEFT JOIN `cer_c2customer` C2C 
              ON ( C2C.c2c_id = MASTER.master_id AND C2C.c2c_source = MASTER.master_source ) 

WHERE  ( MASTER.master_id NOT LIKE 'TAV%' ) 
        AND (( MASTER.master_class <> 'type2' ) OR ( MASTER.master_class <> 'type3' )) 
        AND ( MASTER.master_status <> 'Cancelled' ) 
        AND ( MASTER.master_end >= Now() AND MASTER.master_start >= Date_sub(Now(), INTERVAL 1 day) ) 

If I try to run this on phpMyAdmin I have to literally wait for 5min and get this result: 3,699 total, Query took 0.9358 sec

I have indexed MASTER.master_id, MASTER.master_start, MASTER.master_end, MASTER.master_source aswell as c2c.c2c_id, C2C.c2c_source and C2C.c2c_customer but it doesn't seem to help.

Additional Info: cer_master MASTER table has 277,502 rows and cer_c2customer C2C table has 72,788 rows.

Can someone help me optimize this query? I need it badly and cannot think of another way.

EDIT: Results from the EXPLAIN query:

+----+-------------+--------+-------+-------------------------------------------------------+---------------------------------+---------+------+-------+-------------+
| id | select_type | table  | type  | possible_keys                                         | key                             | key_len | ref  | rows  | Extra       |
+----+-------------+--------+-------+-------------------------------------------------------+---------------------------------+---------+------+-------+-------------+
| 1  | SIMPLE      | MASTER | range | CHM_MASTER_SCHEDULED_START_DATE,CHM_MASTER_SCHEDUL... | CHM_MASTER_SCHEDULED_START_DATE | 4       | NULL | 5042  | Using where |
+----+-------------+--------+-------+-------------------------------------------------------+---------------------------------+---------+------+-------+-------------+
| 1  | SIMPLE      | C2C    | ALL   | CER_C2C_CHANGE_ID                                     | NULL                            | NULL    | NULL | 72788 |             |
+----+-------------+--------+-------+-------------------------------------------------------+---------------------------------+---------+------+-------+-------------+

Table  Create Table
master  CREATE TABLE `master` (
 `ID` int(11) NOT NULL AUTO_INCREMENT,
 ` MASTER_LAST_MODIFIED_DATE` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 ` MASTER_SOURCE` varchar(16) NOT NULL,
 ` MASTER_ID` varchar(16) NOT NULL,
 ` MASTER_SUMMARY` text NOT NULL,
 ` MASTER_NOTES` text NOT NULL,
 ` MASTER_SERVICE` varchar(255) NOT NULL,
 ` MASTER_SITE` text NOT NULL,
 ` MASTER_START` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 ` MASTER_END` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
 ` MASTER_DEPARTMENT_FLAG` varchar(8) NOT NULL,
 ` MASTER_RISK` int(8) NOT NULL DEFAULT '1',
 ` MASTER_IMPACT_LEVEL` varchar(64) NOT NULL,
 ` MASTER_TOOL_STATUS` varchar(32) NOT NULL,
 ` MASTER_IMPACT_RISK_NOTES` text NOT NULL,
 ` MASTER_CALENDAR_WEEK` varchar(16) NOT NULL,
 ` MASTER_TAGS` varchar(1024) NOT NULL,
 ` MASTER_VETO` tinyint(1) NOT NULL DEFAULT '0',
 ` MASTER_LAYER_TAGS` text NOT NULL,
 ` MASTER_ORAKEL_ID` int(11) NOT NULL DEFAULT '0',
 ` MASTER_USED_TEMPLATE` text NOT NULL,
 PRIMARY KEY (`ID`),
 KEY ` MASTER_ID` (` MASTER_CHANGE_ID`),
 KEY ` MASTER_LAST_MODIFIED_DATE` (` MASTER_LAST_MODIFIED_DATE`),
 KEY ` MASTER_SERVICE` (` MASTER_SERVICE`),
 KEY ` MASTER_START` (` MASTER_START`),
 KEY ` MASTER_END` (` MASTER_END_`),
 KEY ` MASTER_SOURCE` (` MASTER_SOURCE`)
) ENGINE=MyISAM AUTO_INCREMENT=278315 DEFAULT CHARSET=utf8

and this is show create table from c2c_customer: cerberus_change2customer

CREATE TABLE `c2c_customer` (
 `ID` int(11) NOT NULL AUTO_INCREMENT,
 `C2C_SOURCE` text NOT NULL,
 `C2C_ID` text NOT NULL,
 `C2C_CUSTOMER` text NOT NULL,
 `C2C_LAST_MODFIED` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
 PRIMARY KEY (`ID`),
 FULLTEXT KEY `C2C_ID` (`C2C_ID`),
 FULLTEXT KEY `C2C_CUSTOMER` (`C2C_CUSTOMER`)
) ENGINE=MyISAM AUTO_INCREMENT=516044 DEFAULT CHARSET=utf8
sardine
  • 157
  • 1
  • 17
  • What indexes have you used? What is your clustered index? – Ben Jul 18 '16 at 11:31
  • you need to post your explain – e4c5 Jul 18 '16 at 11:32
  • @Ben I'm afraid I'm not an expert on indexing, all I did was going into 'structure'-view of the db and then clicking `add index`on the column from the`more`-menu. Can I do this differently/more efficiently? – sardine Jul 18 '16 at 11:37
  • @e4c5 can you please specify this? What should I post additionally? Sorry, I'm not an export with DB administration :( – sardine Jul 18 '16 at 11:38
  • Are they separate indexes on each field, or one index on all fields? What other indexes are there? – Ben Jul 18 '16 at 11:38
  • @Ben honestly, I don't know! Would you know how I can find this out? All I know is that there is a write-process 2times a day with about 6000 updated entries. This write-process didn't slow down after I indexed the specific columns. But it did make a few queries A LOT faster than before, so it definitely did do something :) - but nothing that helps me with this left join unfortunately.. – sardine Jul 18 '16 at 11:40
  • http://stackoverflow.com/questions/1498777/how-do-i-show-the-schema-of-a-table-in-a-mysql-database – Ben Jul 18 '16 at 11:44
  • @Ben ok I did this! Should I include it in the initial post? – sardine Jul 18 '16 at 11:50
  • @Ben the indexed columns have 'MUL' as key. – sardine Jul 18 '16 at 11:51
  • http://dev.mysql.com/doc/refman/5.7/en/explain.html – e4c5 Jul 18 '16 at 11:51
  • @e4c5 the indexed columns have 'MUL' as key. – sardine Jul 18 '16 at 11:51
  • `EXPLAIN [your query here]` - post the result, in addition to proper CREATE TABLE statements. Edit your question accordingly. – Strawberry Jul 18 '16 at 11:51
  • @Strawberry thank you very much! I included the EXPLAIN statement in the original post. However the CREATE TABLE statement is very extensive :/ should I really put that all in the OP? – sardine Jul 18 '16 at 11:59
  • I really think you should! – Strawberry Jul 18 '16 at 12:00
  • 1
    Not part of the question but this part in the where clause is true for all rows or am I wrong: `( MASTER.master_class <> 'type2' ) OR ( MASTER.master_class <> 'type3' )` – Jan Zeiseweis Jul 18 '16 at 12:02
  • @JanZeiseweis hm no it shouldn't, because there are a lot of types! And that part should mean 'give me all the rows except type2- and type3-rows'. This should be correct, yes? – sardine Jul 18 '16 at 12:07
  • 1
    If you want to exclude these types, better use `where master_class not in ('type2', 'type3')` The way you wrote it means: Give me all rows where the master_class is not type2 OR where the master_class is not type3. So if the master_class is type2 the second condition would apply. Or connect both conditions with `and` instead of or. – Jan Zeiseweis Jul 18 '16 at 12:09
  • Did you try to remove the entire `where` clause and add `limit 100` instead? – Jan Zeiseweis Jul 18 '16 at 12:11
  • @Strawberry okay I shortened it a bit with the irrelevant parts and copypasted it to the original post. Thank you for your help! – sardine Jul 18 '16 at 12:15
  • @JanZeiseweis ok that sounds far more elegant with 'NOT IN' and I'll try it. But didn't I defy the possibility you wrote by putting parentheses around, like so: `AND (( MASTER.master_class <> 'type2' ) OR ( MASTER.master_class <> 'type3' ))` ? – sardine Jul 18 '16 at 12:17
  • I'm no expert on this stuff, but I'd be tempted to remove all indexes except the PKs, and then add the following composite indexes instead... `(master_id, master_class,master_status,master_start)` and `(c2c_id,c2c_source)` (although seeing as c2c has all the rows, it would have been nice to see that table too) – Strawberry Jul 18 '16 at 12:26
  • @Strawberry I didn't know about composite indexes. I'll google them up and see if I can implement them. Thank you for your help! Btw, i added the show create table of c2c in the original post too, for good measure :) – sardine Jul 18 '16 at 12:40

3 Answers3

1

Your index should be on all columns, not separate indexes for each column.

For example:

ALTER TABLE `cer_c2customer` ADD INDEX `cer_c2customer_ID_SOURCE_CUSTOMER` (c2c_id, c2c_source, c2c_customer)

This means that the one index can be used to locate the data and also supply all of the columns required from this table in the query.

In addition you probably want the clustering index on cer_master to be the start date or end date.

Ben
  • 34,935
  • 6
  • 74
  • 113
  • thank you so much for the help. I hope this msg reached you in time because - ugh, I don't know how to proceed! I made a copy of my table and did what you said, altered it and added the index on those three columns. But what should I do now? Can I just run my old query on this copied table and it should go faster with this new index? Or do I somehow have to refer to it? – sardine Jul 18 '16 at 14:16
  • Nevermind! It works like a CHARM! Thank you so much, the query now runs in a few milliseconds! – sardine Jul 18 '16 at 14:25
  • @sardine Check that that's not because of caching by including the SQL_NO_CACHE keyword. – Strawberry Jul 18 '16 at 14:46
  • @Ben, MyISAM has no "clustering indexes". – Rick James Jul 18 '16 at 14:59
  • @RickJames, good point, probably don't want to use ISAM then :-) – Ben Jul 18 '16 at 15:04
  • For multiple reasons, it would be better to use InnoDB. However, your "covering" index is likely to be the most significant speedup. – Rick James Jul 18 '16 at 15:09
1

One trick is to order your where clause properly

WHERE  ( MASTER.master_id NOT LIKE 'TAV%' ) 
        AND (( MASTER.master_class <> 'type2' ) OR ( MASTER.master_class <> 'type3' )) 
        AND ( MASTER.master_status <> 'Cancelled' ) 
        AND ( MASTER.master_end >= Now() AND MASTER.master_start >= Date_sub(Now(), INTERVAL 1 day) )

Move the where clause up which is fastest and has the samllest result set.

For example if you know this condition has small result set move it up

( MASTER.master_status <> 'Cancelled' )

So, the query becomes

WHERE  ( MASTER.master_status <> 'Cancelled' )  
        AND (( MASTER.master_class <> 'type2' ) OR ( MASTER.master_class <> 'type3' )) 
        AND ( MASTER.master_id NOT LIKE 'TAV%' )
        AND ( MASTER.master_end >= Now() AND MASTER.master_start >= Date_sub(Now(), INTERVAL 1 day) )
shikhar
  • 2,431
  • 2
  • 19
  • 29
  • The order of clauses in the `WHERE` clause does not really matter. The optimizer will pick one index (or no index) based on statistics; this is by far the most important for performance. Then it will evaluate the rest of the `WHERE` for each row not yet filtered out. At that point, the order matters, but it is not significant enough to matter. – Rick James Jul 18 '16 at 15:07
1

In order of importance:

  1. Fix the OR that is always TRUE. (Bug fix, not optimization)
  2. Add Ben's composite (and covering) index.
  3. Change to InnoDB.
  4. INDEX(master_start), INDEX(master_end) -- Yes, two separate, not one composite, index. The optimizer may pick one of them and benefit.
  5. Order the WHERE clause, as Shikhar suggests. (Even after doing the rest, this will provide only an insignificant improvement.)
Rick James
  • 135,179
  • 13
  • 127
  • 222