MySQL indexes: how do they work?

Question

I'm a complete newbie with MySQL indexes. I have several MyISAM tables on MySQL 5.0x having utf8 charsets and collations with 100k+ records each. The primary keys are generally integer. Many columns on each table may have duplicate values.

I need to quickly count, sum, average, or otherwise perform custom calculations on any number of fields in each table or joined on any number of others.

I found this page giving an overview of MySQL index usage: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html, but I'm still not sure I'm using indexes right. Just when I think I've made the perfect index out of a collection of fields I want to calculate against, I get the "index must be under 1000 bytes" error.

Can anyone explain how to most efficiently create and use indexes to speed up queries?

Caveat: upgrading Mysql is not possible in this case. Using Navicat Light for db administration, but this app isn't required.

only put index on fields that you would like to search against in a where clause, not ones that you want to sum or average. — dqhendricks, Jan 09 '11 at 03:44
@dqhendricks: that's the impression I get from MySQL's doc page linked above. But, I will often have more than one field in `WHERE`, and often more than one field added into an index throws this error. — bob-the-destroyer, Jan 09 '11 at 03:50
if you have too many where fields, you may be structuring your tables wrong, for instance, adding a bunch of fields for attributes, instead of having a separate attributes table that you link to you main table using a foreign key and JOIN queries. — dqhendricks, Jan 09 '11 at 03:53
Also, it is often better to index the main search field as opposed to indexing them all. — dqhendricks, Jan 09 '11 at 04:17

score 8 · Accepted Answer · answered Jan 09 '11 at 07:07

When you create an index on a column or columns in MySQL table, the database is creating a data structure called a B-tree (assuming you use the default index setting), for which the key of each record is a concatenation of the values in the indexed columns.

For example, let's say you have a table that is defined like:

CREATE TABLE mytable (
 id int unsigned auto_increment,
 column_a char(32) not null default '',
 column_b int unsigned not null default 0,
 column_c varchar(512),
 column_d varchar(512),
 PRIMARY KEY (id)
) ENGINE=MyISAM;

Then let's give it some data:

INSERT INTO mytable VALUES (1, 'hello', 2, null, null);
INSERT INTO mytable VALUES (2, 'hello', 3, 'hi', 'there');
INSERT INTO mytable VALUES (3, 'how', 4, 'are', 'you?');
INSERT INTO mytable VALUES (4, 'foo', 5, '', 'bar');

Now suppose you decide to add a key to column_a and column_b like:

ALTER TABLE mytable ADD KEY (column_a, column_b);

The database is going to create the aforementioned B-tree, which will have four keys in it, one for each row:

hello-2
hello-3
how-4
foo-5

When you perform a search that references the column_a column, or that references the column_a AND column_b columns, the database will be able to use this index to narrow the record set it has to examine. Let's say you have a query like:

SELECT ... FROM mytable WHERE column_a = 'hello';

Even though the above query does not specify a value for the column_b column, it can still take advantage of our index by looking for all keys that begin with "hello". For the same reason, if you had a query like:

SELECT ... FROM mytable WHERE column_b = '2';

This query would NOT be able to use our index, because it would have to parse the index keys themselves to try to determine which keys' second value matches '2', which is terribly inefficient.

Now, let's address your original question of the maximum length. Suppose we try to create an index spanning all four non-PK columns in this table:

ALTER TABLE mytable ADD KEY (column_a, column_b, column_c, column_d);

You will get an error:

ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes

In this case our column lengths are 32, 10, 512, and 512, which in a single-byte-per-character situation is 1066, which is above the limit of 1000. Suppose that it DID work; you would be creating the following keys:

hello-2-
hello-3-hi-there
how-4-are-you?
foo-5--bar

Now, suppose that you had values in column_c and column_d that were very long -- 512 characters each. Even in a basic single-byte character set, your keys would now be over 1000 bytes in length, which is what MySQL is complaining about. It gets even worse with multibyte character sets, where seemingly "small" columns can still push the keys over the limit.

If you MUST use a large compound key, one solution is to use InnoDB tables rather than the default MyISAM tables, which support a larger key length (3500 bytes) -- you can do this by swapping ENGINE=InnoDB instead of ENGINE=MyISAM in the declaration above. However, generally speaking, if you are using long keys there is probably something wrong with your table design.

Remember that single-column indexes often provide more utility than multi-column indexes. You want to use a multi-column index when you are going to often/always take advantage of it by specifying all of the necessary criteria in your queries. Also, as others have mentioned, do NOT index every column of a table, since each index is adding storage overhead to your database. You want to limit your indexes to the columns that are frequently used by queries, and if it seems like you need too many, you should probably think about breaking up your tables up into more logical components.

Thanks for great explanation. What do you think about the _prefix indexes_ solution given by @bill-karvin here?: http://stackoverflow.com/a/8747703/569439 How do you think such indexes will work? — rineez, Mar 25 '17 at 14:42

score 1 · Answer 2 · answered Jan 09 '11 at 03:15

Indexes generally aren't well suited for custom calculations where the user is able to construct their own queries. Typically you choose the indexes to match the specific queries you intend to run, using EXPLAIN to see if the index is being used.

In the case that you have absolutely no idea what queries might be performed it is generally best to create one index per column - and not one index covering all columns.

If you have a good idea of what queries might be run often you could create an extra index for those specific queries. You can also add indexes later if your users complain that certain types of queries run too slow.

Also, indexes generally aren't that useful for calculating counts, sums and averages since these types of calculations require looking at every row.

"using EXPLAIN to see if the index is being used." Thanks. I'll be sure to check that. But on the expected queries, how do you think I should best form the indexes? I'm assuming generally by what fields are referenced in `WHERE`, but I don't know why I'm running into or how to avoid a byte limit there. Often it prevents me from adding more than a single field to the index. — bob-the-destroyer, Jan 09 '11 at 03:26

score 1 · Answer 3 · answered Jan 09 '11 at 03:36

1

It sounds like you are trying to put too many fields into your index. The limit is the probably the number of bytes it takes to encode all the fields.

The index is used in looking up the records, so you want to choose the fields which you are "WHERE"ing on. In choosing between those fields, you want to choose the ones that will narrow the results the quickest.

As an example, a filter on Male/Female will usually not help much because you are only going to save about 50% of the time. However, a filter on State may be useful because you'll break down into many more categories. However, if almost everybody in the database is in a single state then that won't work.

answered Jan 09 '11 at 03:36

Winston Ewert

44,070
10
68
83

"The limit is the probably the number of bytes it takes to encode all the fields." How do you mean? Encoding the field names, all column unique values, or...? – bob-the-destroyer Jan 09 '11 at 03:41
1

@bob-the-destroyer: the values. i.e. 4 bytes for an integer, 11 for an 11-character text field, etc. – Winston Ewert Jan 09 '11 at 03:47
If you are hitting that limit you have way too many field in your index. – Winston Ewert Jan 09 '11 at 03:48
"the values" Then that makes sense when I'm going up against 255 varchars. I think MySQL indexes are starting to make a little more sense now. – bob-the-destroyer Jan 09 '11 at 03:56

score 1 · Answer 4 · answered Jan 09 '11 at 04:03

Remember that indexes are for sorting and finding rows.

The error message you got sounds like it is talking about the 1000 byte Prefix Limit for MyISAM table indexes. From http://dev.mysql.com/doc/refman/5.0/en/create-index.html:

The statement shown here creates an index using the first 10 characters of the name column:

CREATE INDEX part_of_name ON customer (name(10)); If names in the column usually differ in the first 10 characters, this index should not be much slower than an index created from the entire name column. Also, using column prefixes for indexes can make the index file much smaller, which could save a lot of disk space and might also speed up INSERT operations.

Prefix support and lengths of prefixes (where supported) are storage engine dependent. For example, a prefix can be up to 1000 bytes long for MyISAM tables, and 767 bytes for InnoDB tables.

Maybe you can try a FULLTEXT index for problematic columns.

ntag: I ended up naming each index 'a', 'b', 'c', etc which still caused this size error. Which is why my confusion on index size limits here. — bob-the-destroyer, Jan 09 '11 at 04:11

MySQL indexes: how do they work?

4 Answers4