Indices and using primary keys as indices in MySQL

Question

I have a table Assets on an InnoDB engine defined as:

CREATE TABLE Assets (
qid SMALLINT(5) NOT NULL,
sid BIGINT(20) NOT NULL AUTO_INCREMENT,
...
PRIMARY KEY (sid,qid),
KEY sid (sid)
);

I'm running the following query:

SELECT COUNT(*) FROM Assets WHERE sid>10000;

On my machine, this query takes about 30 seconds with 2 millions entries in the table. Now, if I modify the query to use indices, the results vary widely:

SELECT COUNT(*) FROM Assets USE INDEX(<index>) WHERE sid>10000;

NO INDEX : No explicit USE INDEX, i.e., the first SELECT query : 30 secs
KEY sid (sid) : 1.5 secs
KEY cid (sid,qid) : 1.5 secs
PRIMARY : I used USE INDEX(PRIMARY) within the query. : 30 secs

So these are my questions:

I thought a query would automatically use the primary key as its index, based on this. And yet there's a major difference between USE INDEX (cid) and NO INDEX. What's the difference? Also, how I do explicitly give the primary key as the index?
If NO INDEX doesn't actually use the primary key as an index, what does USE INDEX(PRIMARY) do that causes it to have the same running time as NO INDEX?
Is there a difference (not just performance wise) between USE INDEX(sid) and USE INDEX(cid) in a query that only filters by sid?

Forgive the long post, but I wanted to let it be open to discussion.

Ok, here's what I have found out so far:

First of all, I'm told the key setup should be either: PRIMARY KEY(qid,sid), KEY(sid) or PRIMARY KEY(sid,qid), KEY(qid). I don't really understand the difference. If someone does, please let me know.

Secondly, the KEY sid(sid) references far fewer index pages than a larger key, so it tends to be faster. As for the difference between using the PRIMARY KEY as the index and a proper KEY (even if they use the same fields), I've been told it's something like this:

Primary keys index the entire table data with the fields of the primary key. That means that the PRIMARY KEY and the data are stored together. So a query using the PRIMARY KEY would have to go through the entire table data, which even indexed would bog down on large uncacheable tables.

With discrete keys, the number of rows may be the same, but a much smaller index (consisting of the indicated fields) is scanned, which hits a smaller number of disk blocks, and hence runs much faster. I'm assuming this is also the reason for the difference in using USE INDEX(cid) and using the primary key as index, both of which have the same fields.

Just a side note: if you have `PRIMARY KEY (sid,qid)` then `KEY sid (sid)` makes no sense — zerkms, Apr 18 '12 at 08:04
PS: put EXPLAIN for `SELECT COUNT(*) FROM Assets WHERE sid>10000` please — zerkms, Apr 18 '12 at 08:07
how much rows with `sid>10000` out of total 2M? If it is more than ~30% then mysql decides that fullscan is cheaper. And that is why forcing index helps — zerkms, Apr 18 '12 at 08:11
Yeah, it's more than 30%. If I set it around the halfway mark (1 million) the query takes about 20 seconds. — Karthik V, Apr 18 '12 at 08:14
If you decrease the result set (by specifying bigger value in `sid>10000`) to about 25-30% - does it work faster without hints? — zerkms, Apr 18 '12 at 08:16
Here's the `EXPLAIN` for the `SELECT COUNT(*) FROM Assets WHERE sid>10000` `id | select_type | table | possible_keys | key | key_len | ref | rows | Extra` `1 | SIMPLE | Assets| PRIMARY | 8 | NULL | NULL| 1087364 | Using where; Using index` — Karthik V, Apr 18 '12 at 08:18
It shouldn't affect it in any way, but what if you use `COUNT(sid)` instead? just curious, because it shouldn't change anything — zerkms, Apr 18 '12 at 08:21
No change really. And even if it did, I really need the entire tuple. :) — Karthik V, Apr 18 '12 at 08:23
@zerkms : About your first comment, would mind telling me the what difference the order of the fields in a composite primary key would make? Like between `PRIMARY KEY(qid,sid)` and `PRIMARY KEY(sid,qid)` — Karthik V, Apr 23 '12 at 06:03

Luke · Answer 1 · 2012-05-11T16:31:53.860

From my experience, having an index which is a subset of another one tends to slow things down.. but you mileage may vary, since you have to consider a lot of things when dealing with indexes..

For instance, if you read often and rarely change data, then having many indexes might help you more; if your operations involve a lot of insert/update/delete then having too many indexes might slow you down quite a bit.

If your primary key is (sid, qid) then I see no fit on having another key (sid) which the engine can probably retrieve as a prefix of the PK. I'd rather add an index on qid if I'm going do leverage it - that is if I have some queries filtering or sorting on that field or if I have some JOINs on that field..

As of the order of the fields on the primary key, I usually try to determine how they will be used in my queries: if all my queries make use of sid and some use both sid and qid, then choose (sid,qid); if they all use qid and only some also use sid, then choose (qid, sid); if they happen to use either sid or qid, then have a PK (sid, qid) and another key (qid), so that queries using both fields will be using your PK, the same will happen for queries using just sid, and finally those using just qid will be using the (qid) key.

I'm just a bit confused about the use(primary) forcing mysql not to use indexes, but that might be something (a bug?) related to your mysql version..

Here you can find some hints on index hints: http://dev.mysql.com/doc/refman/5.1/en/index-hints.html

In general try not to poke too much with index hints, the optimizer usually does a good job! if it doesn't, probably there is a flaw somewhere or it simply considered that a table scan is faster because indexes are not selective enough.

Furthermore, you might sometimes be in need of a table optimization to refresh index statistics.. but since you're using InnoDB it is probably not the case...

HTH

See also this [link](http://www.databasejournal.com/features/mysql/article.php/1382791/Optimizing-MySQL-Queries-and-Indexes.htm) about mysql query optimization; it is clear and well written in my opinion — Luke, May 08 '12 at 00:30

Indices and using primary keys as indices in MySQL

1 Answers1