5

I have a table of over 5 million rows. When i perform a select query it is taking around 20 seconds.

SELECT CompUID,Weburl FROM `CompanyTable` WHERE (Alias1='match1' AND Alias2='match2' )OR Alias3='match3' OR Alias4='match4'

Here is the table structure:

CREATE TABLE `CompanyMaster` (
  `CompUID` int(11) NOT NULL AUTO_INCREMENT,
  `Weburl` varchar(150) DEFAULT NULL,
  `CompanyName` varchar(200) DEFAULT NULL,
  `Alias1` varchar(150) DEFAULT NULL,
  `Alias2` varchar(150) DEFAULT NULL,
  `Alias3` varchar(150) DEFAULT NULL,
  `Alias4` varchar(150) DEFAULT NULL,  
  `Created` datetime DEFAULT NULL,
  `LastModified` datetime DEFAULT NULL,  
  PRIMARY KEY (`CompUID`),
  KEY `Alias` (`Alias1`,`Alias2`,`Alias3`,`Alias4`)
) ENGINE=InnoDB AUTO_INCREMENT=5457968 DEFAULT CHARSET=latin1

Here is the EXPLAIN from that query:

--------+------------------------------------------------------------------------------------------------------+
| id | select_type | table        | type  | possible_keys | key    | key_len  | ref  | rows    | Extra         |
+----+-------------+----------+-------+---------------+------+---------+------+---------+----------------------+
|  1 | SIMPLE      | CompanyTable | ALL   |     Alias     | NULL   | NULL     | NULL | 5255929 |  Using where  |
+----+-------------+----------+-------+---------------+------+---------+------+---------+----------------------+

I used the composite index Alias (Alias1,Alias2,Alias3,Alias4). But i believe it's not the best one. Please suggest me the right indexing for this select query lookup.

Sandy
  • 63
  • 1
  • 5
  • 1
    The `OR Alias3='match3' OR Alias4='match4'` is what is forcing the full table scan. This clause is essentially unindexed. In order to optimize this query, you'll need to add an index on Alias3 and Alias4. – drew010 Mar 02 '16 at 05:48
  • You can check the performance of your query by providing it a hint like so: `SELECT CompUID,Weburl FROM `CompanyTable` use index(Alias) ...`. Does that make a difference in speeds? – zedfoxus Mar 02 '16 at 05:57
  • share counts of following query: "select count(*) from `CompanyTable` where Alias1='match1' AND Alias2='match2';" and "select count(*) from `CompanyTable` where Alias1='match3'" and "select count(*) from `CompanyTable` where Alias1='match4". – Zafar Malik Mar 02 '16 at 06:26

3 Answers3

4

For the query engine to use a column in a composite index the columns to the left must be satisfied first. That is, columns must be used as restrictions that reduce candidate rows as read left-to-right.

The OR alias3 (or alias4) clauses violate this rule as it says "I don't care what the left parts (alias1 or alias2 (or alias3)) were, because I am not dependent upon them".

A full table scan is then required to see if there are any alias3 (or alias4) values that meet the conditions.

The potentially useful indices in this condition would be:

  • INDEX(alias1, alias2): alias1 AND alias2 covers this composite index
  • INDEX(alias3)
  • INDEX(alias4)

Actual stats and plan selection require further investigation - but at least now the query planner has something to work with.


That being said - and I'm not sure what the role of an "alias" is - it may make sense to normalize the table. The following does change the semantics slightly as it drops the "alias position" (which could be added back in) and should be verified for semantic correctness.

CREATE TABLE `CompanyMaster` (
  `CompUID` int(11) NOT NULL AUTO_INCREMENT
 ,`CompanyName` varchar(200) DEFAULT NULL
 ,PRIMARY KEY (`CompUID`)
)

-- (This establishes a unique alias-per-company, which may be incorrect.)
CREATE TABLE `CompaniesAliases` (
  `CompUID` int(11) NOT NULL
 ,`Alias` varchar(150) NOT NULL
  -- Both CompUID and Alias appear in 'first' positions:
  --   CompUID for Join, Alias for filter
 ,PRIMARY KEY (`CompUID`, `Alias`)
 ,KEY (`Alias`)
 -- Alternative, which may change plan selection by eliminating options:
 -- ,PRIMARY KEY (`Alias`, `CompUID`) -- and no single KEY/index on Alias or CompUID
 ,FOREIGN KEY(CompUID) REFERENCES CompanyMaster(CompUID)
)

It can then be queried approximately similar to the original, being different in that it does not care which "alias" matches which value:

-- AND constructed by joins (could also use GROUP BY .. HAVING COUNT)
SELECT c.CompUID FROM `CompanyTable` c
JOIN `CompaniesAliases` ac1
ON ac1.CompUID = c.CompUID AND Alias = 'match1'
JOIN `CompaniesAliases` ac2
ON ac2.CompUID = c.CompUID AND Alias = 'match2'

-- OR constructed by union(s)
UNION
SELECT c.CompUID FROM `CompanyTable` c
JOIN `CompaniesAliases` ac1
ON ac1.CompUID = c.CompUID AND (Alias = 'match3' OR Alias = 'match4')

I would expect such a query to be implemented efficiently in SQL Server - YMMV with MySQL.

Community
  • 1
  • 1
user2864740
  • 60,010
  • 15
  • 145
  • 220
0

I would suggest the following solution, create a table with a complex_alias_field. It increase a little bit your data, and your data is now redundant, but i think it's a simple straightforward solution.

1. Create Table

CREATE TABLE `CompanyMaster` (
`CompUID` int(11) NOT NULL AUTO_INCREMENT,
  `Weburl` varchar(150) DEFAULT NULL,
  `CompanyName` varchar(200) DEFAULT NULL,
  `Alias1` varchar(150) DEFAULT NULL,
  `Alias2` varchar(150) DEFAULT NULL,
  `Alias3` varchar(150) DEFAULT NULL,
  `Alias4` varchar(150) DEFAULT NULL,
  `Created` datetime DEFAULT NULL,
  `LastModified` datetime DEFAULT NULL,
  `ComplexAliasQuery` BOOLEAN DEFAULT FALSE,
  PRIMARY KEY (`CompUID`),
  KEY `Alias` (`Alias1`,`Alias2`,`Alias3`,`Alias4`),
  KEY `AliasQuery` (`ComplexAliasQuery`)
) ENGINE=InnoDB AUTO_INCREMENT=5457968 DEFAULT CHARSET=latin1;

2. Fill your new Field ComplexAliasQuery

UPDATE CompanyMaster set ComplexAliasQuery = TRUE WHERE (Alias1='match1' AND Alias2='match2' )OR Alias3='match3' OR Alias4='match4';

3. For updating one of the Fields Alias1,Alias2,Alias3,Alias4

For Updating just fill the ComplexAliasQuery, too. You could do this perhaps with an Trigger http://dev.mysql.com/doc/refman/5.7/en/trigger-syntax.html or in your code, if your can't use a trigger, cause you are running a cluster.

4. Your Simple Query is at the end

SELECT CompUID,Weburl FROM `CompanyMaster` WHERE ComplexAliasQuery IS TRUE;

with hitting the index

+----+-------------+---------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table         | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+---------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | CompanyMaster | ALL  | NULL          | NULL | NULL    | NULL |    1 | Using where |
+----+-------------+---------------+------+---------------+------+---------+------+------+-------------+

Another Solution

If you dont like the field in your table CompanyMaster, you can outsource it in a new Table and called it IndexAliasCompanyMaster and then just join with this table.

Kordi
  • 2,405
  • 1
  • 14
  • 13
0

None of the above. Re-design the schema.

If the 4 Aliases are just synonyms for a company, do not splay an array of them in the table, move them to another table. (user2864740 got half way there; I am saying to go all the way.)

CREATE TABLE `CompanyMaster` (
  `CompUID` int(11) NOT NULL AUTO_INCREMENT,
  `Weburl` varchar(150) DEFAULT NULL,
  `CompanyName` varchar(200) DEFAULT NULL,
  `Created` datetime DEFAULT NULL,
  `LastModified` datetime DEFAULT NULL,  
  PRIMARY KEY (`CompUID`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `CompaniesAliases` (
  `CompUID` int(11) NOT NULL,
  `Alias` varchar(150) NOT NULL,
  PRIMARY KEY (Alias)  -- Assuming no two companies can have the same Alias
  KEY (CompUID)
) ENGINE=InnoDB;

(You really should convert all tables to InnoDB.)

Now, your original query becomes

SELECT  CompUID, Weburl
    FROM  `CompanyTable`
    JOIN  CompaniesAliases  USING(CompUID)
    WHERE  Alias IN ('match1', 'match2', 'match3', 'match4');

and it will run a lot faster.

If you need to show the company name and its aliases, consider

SELECT CompanyName,
       GROUP_CONCAT(Alias) AS 'Also known as'
    FROM  `CompanyTable`
    JOIN  CompaniesAliases  USING(CompUID)
    WHERE ...
    GROUP BY CompUID;
Rick James
  • 135,179
  • 13
  • 127
  • 222