0

I'm still trying to get my head around the best way to use INDEXES in MySQL. How do you know when to merge them together and when to have them separate?

Below are the indexes from the Wordpress posts table. See how post_name, post_parent and post_author are seperate entries? And then they have type_status_date which is a mixture of 4 fields?

http://img215.imageshack.us/img215/5976/screenshot20120426at431.png

I don't understand the logic behind this? Can anyone enlighten me?

Ben Sinclair
  • 3,896
  • 7
  • 54
  • 94

2 Answers2

3

Going to be a bit of a long answer but here we go. Please note I am not going to deal with the differences in database engines here(MyISAM and InnoDB have distinct way of implementing what I am trying to describe)

First thing you have to understand about a index is that it is a separate data structure stored on disk. Normally this is a b-tree data structure containing the column(s) that you have indexed and also contain a pointer to the row in the table(this pointer is normally the primary key).

The only index that is stored with the data is the primary key index. Thus a primary key index IS the table.

Lets assume you have following table definition.

CREATE  TABLE `Student` (
 `StudentNumber` INT NOT NULL ,
 `Name` VARCHAR(32) NULL ,
 `Surname` VARCHAR(32) NULL ,
 `StudentEmail` VARCHAR(32) NULL ,
 PRIMARY KEY (`StudentNumber`) );

Since we have a primary key on StudentID there will be a index containing the primary key and the other columns in the index. If you had to look at the data in the index you would probably see something like this.

1 , John ,Doe ,Jdoe@gmail.com

As you can see this is the table data once again showing you that the primary key index IS the table.

The StudentNumber column is indexed which allows your to effectively search on it the rest of the data is stored with the key. Thus if ran the following query:

SELECT * FROM Student WHERE StudentNumber=1

MySQL would use the primary index to quickly find the row and the read the data stored with the indexed column. Since there is a index MySQL can use the index to do a effective binary seek operation on the b-tree.

Also when it comes to retrieving the data after doing the search MySQL can read the data from the index thus we are using 1 operation in the index to retrieve the data. Now if I ran the following query:

SELECT * FROM Student WHERE Name ='Joe' 

MySQL would check if there is a index that it could use to speed the query up. However in my case there is no index on name so MySQL would do a sequential read from the table one row at a time from the first row to the last.

At each row it would evaluate the row against the where clause and return matching row. So basically it reads the primary key index from top to bottom. Remember the primary key index is the table.

If I ran the following statement:

 ALTER TABLE `TimLog`.`student` 
ADD INDEX `ix_name` (`Name` ASC) ;
 ALTER TABLE `TimLog`.`student` 
ADD INDEX `ix_surname` (`Surname` ASC) ;

MySQL would create new indexes on the Student table. This will be stored away from the table on disk and the data inside would look something like this:

Data in ix_Name
John, 1 <--PRIMARY KEY VALUE

Data in ix_Surname
Doe, 1  <--PRIMARY KEY VALUE

Notice the data in the ix_Name index is the name and the primary key value. Great so if I ran the previous select statement MySQL would then read the ix_name index and get the primary key value for matching items and then use the primary key index to get the rest of the data.

So the number of operations to get the data from the index is 2. The matching rows are found in the index and then a lookup happens on the primary key to get the row data out.

You now have the following query:

SELECT * FROM Student WHERE Name='John' AND surname ='Doe' 

Here MySQL cant use both indexes as it would be a waste of operations. If MySQL had to use both indexes in this query the following would happen(this should not happen).

1 Find in the ix_Name the rows with the value John
2 Read the primary key that matches to get the row data
3 Store the matching results
4 Find in the ix Surname the rows with the value Doe
5 Read the primary key that matches to get row data.
6 Store the matching results
7 Take the Name results and Surname results and merge them
8 Return query results.

This is really a waste of IO as MySQL would then read the table twice. Basically using one index would be better than trying to use two(I will explain in a momnet why). MySQL will choose 1 index to use in a this simple query.

So how does MySQL decide on which index to use?

MySQL keeps statistics around indexes internally. These statistics tell MySQL basically how unique a index is. So for the sake of argument lets say the surname index (ix_surname)was more unique than the name index(ix_name) MySQL would use the surname index (ix_surname).

Thus query retrieval would be like this:

1 Use the ix_surname and find rows that match the value Doe
2 Read the primary key and apply the filter for the value John on the actual column data in the row.
3 Return the matched row.

As you can see the number of operations in this search is much less. I have over simplified a lot of the technical detail. Indexing is a interesting thing to master but you have to look at it from the perspective of how do I get the data with the minimal amount of IO.

Hope it is as clear as mud now!

Namphibian
  • 12,046
  • 7
  • 46
  • 76
1

MySQL cannot normally use more than one index at a time. That means, for instance, that when you have a query that filters or sorts on two fields you put them both into the same index.

WordPress likely has a common query that filters and/or sorts on post_type, post_status and post_date. Making an educated guess as to what they stand for, this would likely be the core query for WordPress's Post listing pages. So the three fields are put into the same index.

staticsan
  • 29,935
  • 4
  • 60
  • 73
  • Really? What about [`index_merge`](http://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html)? – eggyal Apr 26 '12 at 06:38
  • I've never seen an index_merge in an `EXPLAIN` statement. And reading the documentation, it doesn't happen very often or easily. – staticsan Apr 26 '12 at 06:41
  • Index Merge does not mean it is merging a index.The Index Merge method is used to retrieve rows with several range scans and to merge their results into one. The merge can produce unions, intersections, or unions-of-intersections of its underlying scans. This access method merges index scans from a single table; it does not merge scans across multiple tables. – Namphibian Apr 26 '12 at 06:46
  • That's really helpful... SO based on the query, MySQL figures out what the best index to use? Does it only use an index if it matches exactly? For example, if the search was for post_type, post_status and post_author, would it still use the type_status_date INDEX or would it use no INDEX? – Ben Sinclair Apr 26 '12 at 06:56
  • MySQL will figure out what index to use based on what is requested coupled with other statistics of the indices. It will also use a partial index, so a simple search filtering on just `post_type` will use the composite index. The MySQL documentation about how indexes work is pretty good, incidentally. – staticsan Apr 26 '12 at 07:04