The first thing you must know is that indexes are a way to avoid scanning the full table to obtain the result that you're looking for.
There are different kinds of indexes and they're implemented in the storage layer, so there's no standard between them and they also depend on the storage engine that you're using.
InnoDB and the B+Tree index
For InnoDB, the most common index type is the B+Tree based index, that stores the elements in a sorted order. Also, you don't have to access the real table to get the indexed values, which makes your query return way faster.
The "problem" about this index type is that you have to query for the leftmost value to use the index. So, if your index has two columns, say last_name and first_name, the order that you query these fields matters a lot.
So, given the following table:
CREATE TABLE person (
last_name VARCHAR(50) NOT NULL,
first_name VARCHAR(50) NOT NULL,
INDEX (last_name, first_name)
);
This query would take advantage of the index:
SELECT last_name, first_name FROM person
WHERE last_name = "John" AND first_name LIKE "J%"
But the following one would not
SELECT last_name, first_name FROM person WHERE first_name = "Constantine"
Because you're querying the first_name
column first and it's not the leftmost column in the index.
This last example is even worse:
SELECT last_name, first_name FROM person WHERE first_name LIKE "%Constantine"
Because now, you're comparing the rightmost part of the rightmost field in the index.
The hash index
This is a different index type that unfortunately, only the memory backend supports. It's lightning fast but only useful for full lookups, which means that you can't use it for operations like >
, <
or LIKE
.
Since it only works for the memory backend, you probably won't use it very often. The main case I can think of right now is the one that you create a temporary table in the memory with a set of results from another select and perform a lot of other selects in this temporary table using hash indexes.
If you have a big VARCHAR
field, you can "emulate" the use of a hash index when using a B-Tree, by creating another column and saving a hash of the big value on it. Let's say you're storing a url in a field and the values are quite big. You could also create an integer field called url_hash
and use a hash function like CRC32
or any other hash function to hash the url when inserting it. And then, when you need to query for this value, you can do something like this:
SELECT url FROM url_table WHERE url_hash=CRC32("http://gnu.org");
The problem with the above example is that since the CRC32
function generates a quite small hash, you'll end up with a lot of collisions in the hashed values. If you need exact values, you can fix this problem by doing the following:
SELECT url FROM url_table
WHERE url_hash=CRC32("http://gnu.org") AND url="http://gnu.org";
It's still worth to hash things even if the collision number is high cause you'll only perform the second comparison (the string one) against the repeated hashes.
Unfortunately, using this technique, you still need to hit the table to compare the url
field.
Wrap up
Some facts that you may consider every time you want to talk about optimization:
Integer comparison is way faster than string comparison. It can be illustrated with the example about the emulation of the hash index in InnoDB
.
Maybe, adding additional steps in a process makes it faster, not slower. It can be illustrated by the fact that you can optimize a SELECT
by splitting it into two steps, making the first one store values in a newly created in-memory table, and then execute the heavier queries on this second table.
MySQL has other indexes too, but I think the B+Tree one is the most used ever and the hash one is a good thing to know, but you can find the other ones in the MySQL documentation.
I highly recommend you to read the "High Performance MySQL" book, the answer above was definitely based on its chapter about indexes.