I have a table Assets
on an InnoDB engine defined as:
CREATE TABLE Assets (
qid SMALLINT(5) NOT NULL,
sid BIGINT(20) NOT NULL AUTO_INCREMENT,
...
PRIMARY KEY (sid,qid),
KEY sid (sid)
);
I'm running the following query:
SELECT COUNT(*) FROM Assets WHERE sid>10000;
On my machine, this query takes about 30 seconds with 2 millions entries in the table. Now, if I modify the query to use indices, the results vary widely:
SELECT COUNT(*) FROM Assets USE INDEX(<index>) WHERE sid>10000;
NO INDEX
: No explicitUSE INDEX
, i.e., the firstSELECT
query : 30 secsKEY sid (sid)
: 1.5 secsKEY cid (sid,qid)
: 1.5 secsPRIMARY
: I usedUSE INDEX(PRIMARY)
within the query. : 30 secs
So these are my questions:
I thought a query would automatically use the primary key as its index, based on this. And yet there's a major difference between
USE INDEX (cid)
andNO INDEX
. What's the difference? Also, how I do explicitly give the primary key as the index?If
NO INDEX
doesn't actually use the primary key as an index, what doesUSE INDEX(PRIMARY)
do that causes it to have the same running time asNO INDEX
?Is there a difference (not just performance wise) between
USE INDEX(sid)
andUSE INDEX(cid)
in a query that only filters bysid
?
Forgive the long post, but I wanted to let it be open to discussion.
Ok, here's what I have found out so far:
First of all, I'm told the key setup should be either: PRIMARY KEY(qid,sid), KEY(sid)
or PRIMARY KEY(sid,qid), KEY(qid)
. I don't really understand the difference. If someone does, please let me know.
Secondly, the KEY sid
(sid
) references far fewer index pages than a larger key, so it tends to be faster. As for the difference between using the PRIMARY KEY as the index and a proper KEY (even if they use the same fields), I've been told it's something like this:
Primary keys index the entire table data with the fields of the primary key. That means that the PRIMARY KEY and the data are stored together. So a query using the PRIMARY KEY would have to go through the entire table data, which even indexed would bog down on large uncacheable tables.
With discrete keys, the number of rows may be the same, but a much smaller index (consisting of the indicated fields) is scanned, which hits a smaller number of disk blocks, and hence runs much faster. I'm assuming this is also the reason for the difference in using USE INDEX(cid)
and using the primary key as index, both of which have the same fields.