1

I wanted to understand how indexing in MySql works. I've few questions regarding indexing.

First is do we have to index columns which will be having only unique values or can we index column in which values can repeat for eg. lastname. I know it's foolish to index lastname but I want to understand how it works. So what I understand by this is...

For Eg. there are 1000 records in a table. and there are 400 lastnames repeating. So if we index "lastname", mysql will take all the unique values and index them and when the search query is fired instead of searching in the 1000 records it will just go through the 600 indexed records which even includes the repeating values once, just saving time.

something like.....

lastnames :-

SMITH

JOHNSON

JONES

BROWN

DAVIS

SMITH //repeat

JHONSON //repeat

SMITH //repeat

BROWN //repeat

WILLIAMS

MySql Index

  1. SMITH

  2. JOHNSON

  3. JONES

  4. BROWN

  5. DAVIS

  6. WILLIAMS

Am I correct....?

Umar Maniar
  • 512
  • 1
  • 5
  • 14
  • 1
    This should answer your question http://stackoverflow.com/questions/1108/how-does-database-indexing-work [1]: http://stackoverflow.com/questions/1108/how-does-database-indexing-work – Matt Asbury Apr 12 '12 at 14:39

3 Answers3

2

There are several indexes, but lets take the btree. This index is a binary tree, with two branches per node.

Making an index You make a binary tree with half of your values left, and the other half right. Easiest is to look at it with numbers: if you have number 1 to 6 you make a tree with 5 at the top, then 2 with 1 and 3, and right you'd have 5 with 4 and 6 as leaves.

Searching something with an index: What you basically ask is "is this node 'less' or 'more' then the value you're looking for. So you ask the first node (discarding half of your values), and go down, meaning you only need to search log(n) values for an index of n values. To 'find' 3, you'd compare with 5 and 2, and there you are. This is WAAY speedy for big numbers.

Nanne
  • 64,065
  • 16
  • 119
  • 163
2

Your premise is somewhat correct. The benefits of an index of the performance of performing a lookup (SELECT). If you have a list of 1,000 last names (regardless of the number of unique names), and you want to find those that equal "Smith", you would have to look through all 1,000 rows to find which entries (if any) match your query. This can be very slow, as it your performance gets worse based on how many rows you have (regardless of the number of unique rows).

Now imagine you have your names alphabetized by Last Name. If you want to find any entries with the last name of "Smith", you could do a "binary search": pick the middle entry and see if the last name is less than or greater than "Smith" in alphabetical order. If it's less, then throw away the first half of the names and only deal with the last half. Pick the middle entry of the remaining names and compare it to Smith, etc...

What you've done is reduced your search time. Now, rather than having to check all n entries to find "Smith", you only have to check log(2)n entries, which can be much smaller for large values of n.

This is essentially what an index does, except the often use B+ trees (similar to the binary tree approach mentioned above but with some extra nice properties) that will help.

Regarding your uniqueness question, yes, you can apply an index to a non-unique column. An index is often used on a column which must be unique (such as a primary key) because, without an index, it can be very expensive to maintain uniqueness in a column. For instance, imagine that you want to add an entry with the last name of "Smith" but you have a unique constraint on the Last Name column. How do you know if there's already an entry named "Smith"? You'll have to search for it. Without an index, that will require examining n entries; with an index, only log(2)n. So it's usually a good idea to keep an index on a unique column to keep the performance reasonable.

Also, the Wikipedia article on database indexes answers your question in more detail.

Jeff Allen
  • 17,277
  • 8
  • 49
  • 70
-1

Read the "Optimization and Indexes" section of MySQL manual.

jordeu
  • 6,711
  • 1
  • 19
  • 19