How do MySQL indexes work?

Question

I am really interested in how MySQL indexes work, more specifically, how can they return the data requested without scanning the entire table?

It's off-topic, I know, but if there is someone who could explain this to me in detail, I would be very, very thankful.

This is a very broad question. If you have a specific example of a query that won't use an index, and you don't know why, you could post it and people might help. — Hammerite, Aug 25 '10 at 16:16
`SELECT * FROM members WHERE id = '1'` - so why with index it works faster? What that index does here? — good_evening, Aug 25 '10 at 16:17
That looks like a query that just looks up a specific, indexed record (perhaps identified by primary key). The index makes this faster because it is stored in memory, the corresponding index row can be looked at and it contains a pointer to where the actual data is stored. So MySQL can go to the exact location in the table without having to scan the table. — Hammerite, Aug 25 '10 at 16:21

score 549 · Accepted Answer · answered Aug 25 '10 at 16:35

Basically an index on a table works like an index in a book (that's where the name came from):

Let's say you have a book about databases and you want to find some information about, say, storage. Without an index (assuming no other aid, such as a table of contents) you'd have to go through the pages one by one, until you found the topic (that's a full table scan). On the other hand, an index has a list of keywords, so you'd consult the index and see that storage is mentioned on pages 113-120,231 and 354. Then you could flip to those pages directly, without searching (that's a search with an index, somewhat faster).

Of course, how useful the index will be, depends on many things - a few examples, using the simile above:

if you had a book on databases and indexed the word "database", you'd see that it's mentioned on pages 1-59,61-290, and 292 to 400. In such case, the index is not much help and it might be faster to go through the pages one by one (in a database, this is "poor selectivity").
For a 10-page book, it makes no sense to make an index, as you may end up with a 10-page book prefixed by a 5-page index, which is just silly - just scan the 10 pages and be done with it.
The index also needs to be useful - there's generally no point to index e.g. the frequency of the letter "L" per page.

You are explaining what is it, not how technically it works internally. — Tutu Kumari, Apr 12 '19 at 07:47
@Tutu Kumari: See the question's revisions; feel free to also revise the answer to fit the current question (note the various engines and index types - see e.g. the documentation here: https://dev.mysql.com/doc/refman/8.0/en/index-btree-hash.html ) — Piskvor left the building, Apr 12 '19 at 07:48
Can you kindly give some "ranges" e.g. making the example of a blog thinking about 'users' table and 'posts' table where you allow to make combined searches. At (roughly) which orders of amount, or, at "how many users" and or "how many posts", setting up proper indexes will become suggested if not either a must? Thank you if is it possible for you to give an idea or some ideas. Thank you if you can try to give thresholds where to begin worry about indexes (how many rows/columns) — Robert, Mar 13 '21 at 18:55

score 286 · Answer 2 · edited Mar 21 '17 at 07:27

The first thing you must know is that indexes are a way to avoid scanning the full table to obtain the result that you're looking for.

There are different kinds of indexes and they're implemented in the storage layer, so there's no standard between them and they also depend on the storage engine that you're using.

InnoDB and the B+Tree index

For InnoDB, the most common index type is the B+Tree based index, that stores the elements in a sorted order. Also, you don't have to access the real table to get the indexed values, which makes your query return way faster.

The "problem" about this index type is that you have to query for the leftmost value to use the index. So, if your index has two columns, say last_name and first_name, the order that you query these fields matters a lot.

So, given the following table:

CREATE TABLE person (
    last_name VARCHAR(50) NOT NULL,
    first_name VARCHAR(50) NOT NULL,
    INDEX (last_name, first_name)
);

This query would take advantage of the index:

SELECT last_name, first_name FROM person
WHERE last_name = "John" AND first_name LIKE "J%"

But the following one would not

SELECT last_name, first_name FROM person WHERE first_name = "Constantine"

Because you're querying the first_name column first and it's not the leftmost column in the index.

This last example is even worse:

SELECT last_name, first_name FROM person WHERE first_name LIKE "%Constantine"

Because now, you're comparing the rightmost part of the rightmost field in the index.

The hash index

This is a different index type that unfortunately, only the memory backend supports. It's lightning fast but only useful for full lookups, which means that you can't use it for operations like >, < or LIKE.

Since it only works for the memory backend, you probably won't use it very often. The main case I can think of right now is the one that you create a temporary table in the memory with a set of results from another select and perform a lot of other selects in this temporary table using hash indexes.

If you have a big VARCHAR field, you can "emulate" the use of a hash index when using a B-Tree, by creating another column and saving a hash of the big value on it. Let's say you're storing a url in a field and the values are quite big. You could also create an integer field called url_hash and use a hash function like CRC32 or any other hash function to hash the url when inserting it. And then, when you need to query for this value, you can do something like this:

SELECT url FROM url_table WHERE url_hash=CRC32("http://gnu.org");

The problem with the above example is that since the CRC32 function generates a quite small hash, you'll end up with a lot of collisions in the hashed values. If you need exact values, you can fix this problem by doing the following:

SELECT url FROM url_table 
WHERE url_hash=CRC32("http://gnu.org") AND url="http://gnu.org";

It's still worth to hash things even if the collision number is high cause you'll only perform the second comparison (the string one) against the repeated hashes.

Unfortunately, using this technique, you still need to hit the table to compare the url field.

Wrap up

Some facts that you may consider every time you want to talk about optimization:

Integer comparison is way faster than string comparison. It can be illustrated with the example about the emulation of the hash index in InnoDB.
Maybe, adding additional steps in a process makes it faster, not slower. It can be illustrated by the fact that you can optimize a SELECT by splitting it into two steps, making the first one store values in a newly created in-memory table, and then execute the heavier queries on this second table.

MySQL has other indexes too, but I think the B+Tree one is the most used ever and the hash one is a good thing to know, but you can find the other ones in the MySQL documentation.

I highly recommend you to read the "High Performance MySQL" book, the answer above was definitely based on its chapter about indexes.

Will following queries have advantage in above case?1.`SELECT last_name, first_name FROM person WHERE last_name= "Constantine"` 2. `SELECT last_name, first_name FROM person WHERE last_name LIKE "%Constantine"` — AkshayT, Nov 30 '13 at 06:18
First querry will, second query will not. Use EXPLAIN: http://dev.mysql.com/doc/refman/5.5/en/explain.html For indexing second query with MySQL, you have to use FULLTEXT INDEX: http://dev.mysql.com/doc/refman/5.5/en/fulltext-search.html — Emilio Nicolás, May 29 '14 at 11:30
I upvoted you because you were at 127 and the #1 answer was at 256. I couldn't avoid making everything nice and clean, binary-wise. — pbarney, Oct 11 '16 at 19:01
This was new information for me "order that you query these fields matters a lot." thanks. — Khatri, Nov 03 '16 at 06:12
I recently went into a similar problem where I had to uniquely identify a record based on the relative URL. I solved this by putting a unique index on the url field (which has a char length of 768 chars which means 3072 bytes for the whole field because of MySQL `utf8mb4`). If I had used the suggested CRC32 hashing, I could have achieved similar results by creating an index on an integer field (CRC32) of just 4 bytes instead of the 3072 bytes per field that I am using now! — Rafay, Sep 15 '17 at 13:59
Coming back to this answer after 1 year and it again helped me optimize the crap outta' my table architecture and SQL queries. Cheers!! — Aditya Hajare, Mar 30 '18 at 10:01
@pbarney after three years they're near 256 and 512 respectively, that's what I call a binary-wise increase! — nanocv, Nov 01 '19 at 09:20

score 51 · Answer 3 · answered Aug 25 '10 at 16:33

51

Basically an index is a map of all your keys that is sorted in order. With a list in order, then instead of checking every key, it can do something like this:

1: Go to middle of list - is higher or lower than what I'm looking for?

2: If higher, go to halfway point between middle and bottom, if lower, middle and top

3: Is higher or lower? Jump to middle point again, etc.

Using that logic, you can find an element in a sorted list in about 7 steps, instead of checking every item.

Obviously there are complexities, but that gives you the basic idea.

answered Aug 25 '10 at 16:33

Joshua

5,336
1
28
42

37

This is called binary search. – Cameron Martin Jun 11 '12 at 16:09
Thanks, finally a answer that explains why it is quicker and not just how the db functions with indexes. – Gershon Herczeg Jul 09 '13 at 16:27
The actual number of steps is highly dependent on the data - number of unique value and distribution across your range. 7 is the theoretical max for 100 values. Full discussion of how to calculate the number of steps here http://stackoverflow.com/questions/10571170/how-many-comparisons-will-binary-search-make-in-the-worst-case-using-this-algori – Joshua May 14 '15 at 15:44
The most common MySQL index is a B+Tree which works similarly to a binary search but not quite the same. The algorithmic complexity is the same but the way it searches is not. See https://en.wikipedia.org/wiki/B-tree – Matt Jul 23 '15 at 20:22

score 5 · Answer 4 · answered Aug 24 '19 at 07:31

In MySQL InnoDB, there are two types of index.

Primary key which is called clustered index. Index key words are stored with real record data in the B+Tree leaf node.
Secondary key which is non clustered index. These index only store primary key's key words along with their own index key words in the B+Tree leaf node. So when searching from secondary index, it will first find its primary key index key words and scan the primary key B+Tree to find the real data records. This will make secondary index slower compared to primary index search. However, if the select columns are all in the secondary index, then no need to look up primary index B+Tree again. This is called covering index.

score 4 · Answer 5 · answered Aug 25 '10 at 16:15

4

Take a look at this link: http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

How they work is too broad of a subject to cover in one SO post.

Here is one of the best explanations of indexes I have seen. Unfortunately it is for SQL Server and not MySQL. I'm not sure how similar the two are...

answered Aug 25 '10 at 16:15

Abe Miessler

82,532
99
305
486

2

Nice article. I don't know SQL Server, but the basic workings look very similar. (metanote: disabling CSS styles in the 2nd linked article unhides the content) – Piskvor left the building Aug 25 '10 at 16:24

score 2 · Answer 6 · edited Apr 19 '17 at 04:43

2

Take at this videos for more details about Indexing

Simple Indexing You can create a unique index on a table. A unique index means that two rows cannot have the same index value. Here is the syntax to create an Index on a table

CREATE UNIQUE INDEX index_name
ON table_name ( column1, column2,...);

You can use one or more columns to create an index. For example, we can create an index on tutorials_tbl using tutorial_author.

CREATE UNIQUE INDEX AUTHOR_INDEX
ON tutorials_tbl (tutorial_author)

You can create a simple index on a table. Just omit UNIQUE keyword from the query to create simple index. Simple index allows duplicate values in a table.

If you want to index the values in a column in descending order, you can add the reserved word DESC after the column name.

mysql> CREATE UNIQUE INDEX AUTHOR_INDEX
ON tutorials_tbl (tutorial_author DESC)

edited Apr 19 '17 at 04:43

The Hungry Dictator

3,444
5
37
53

answered Apr 19 '17 at 04:17

shahirnana

1
2

1

Welcome to Stack Overflow! I've noted that all your answers link to your own videos. Please note that [overt self promotion is not allowed](https://stackoverflow.com/help/behavior). – S.L. Barth is on codidact.com Apr 19 '17 at 13:22
He wants to promote his videos. LOL – Ilyas karim Apr 28 '18 at 12:47
Not to mention his linked video is not available anymore... – goulashsoup Sep 06 '22 at 16:59
This post does not answer the question at all. The question was about how indexes work, not how to create them... – goulashsoup Sep 06 '22 at 17:07

score 2 · Answer 7 · answered Aug 21 '19 at 14:20

Adding some visual representation to the list of answers.

MySQL uses an extra layer of indirection: secondary index records point to primary index records, and the primary index itself holds the on-disk row locations. If a row offset changes, only the primary index needs to be updated.

Caveat: Disk data structure looks flat in the diagram but actually is a B+ tree.

Source: link

score 1 · Answer 8 · answered Jul 24 '19 at 11:24

I want to add my 2 cents. I am far from being a database expert, but I've recently read up a bit on this topic; enough for me to try and give an ELI5. So, here's may layman's explanation.

I understand it as such that an index is like a mini-mirror of your table, pretty much like an associative array. If you feed it with a matching key then you can just jump to that row in one "command".

But if you didn't have that index / array, the query interpreter must use a for-loop to go through all rows and check for a match (the full-table scan).

Having an index has the "downside" of extra storage (for that mini-mirror), in exchange for the "upside" of looking up content faster.

Note that (in dependence of your db engine) creating primary, foreign or unique keys automatically sets up a respective index as well. That same principle is basically why and how those keys work.

score 0 · Answer 9 · answered Sep 27 '20 at 17:17

Let's suppose you have a book, probably a novel, a thick one with lots of things to read, hence lots of words. Now, hypothetically, you brought two dictionaries, consisting of only words that are only used, at least one time in the novel. All words in that two dictionaries are stored in typical alphabetical order. In hypothetical dictionary A, words are printed only once while in hypothetical dictionary B words are printed as many numbers of times it is printed in the novel. Remember, words are sorted alphabetically in both the dictionaries. Now you got stuck at some point while reading a novel and need to find the meaning of that word from anyone of those hypothetical dictionaries. What you will do? Surely you will jump to that word in a few steps to find its meaning, rather look for the meaning of each of the words in the novel, from starting, until you reach that bugging word.

This is how the index works in SQL. Consider Dictionary A as PRIMARY INDEX, Dictionary B as KEY/SECONDARY INDEX, and your desire to get for the meaning of the word as a QUERY/SELECT STATEMENT. The index will help to fetch the data at a very fast rate. Without an index, you will have to look for the data from the starting, unnecessarily time-consuming costly task.

For more about indexes and types, look this.

score 0 · Answer 10 · answered Sep 08 '22 at 10:57

Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. This is much faster than reading every row sequentially.

Indexing adds a data structure with columns for the search conditions and a pointer
The pointer is the address on the memory disk of the row with the
rest of the information
The index data structure is sorted to optimize query efficiency
The query looks for the specific row in the index; the index refers to the pointer which will find the rest of the information.
The index reduces the number of rows the query has to search through from 17 to 4.

How do MySQL indexes work?

10 Answers10

InnoDB and the B+Tree index

The hash index

Wrap up

Linked

Related