55

I've heard them talked about since I started working in tech about 18 months ago. I know that they potentially improve performance, and they seem to be column specific -- ("We index the User table on the date_of_birth column").

Just looking for a quick overview of what exactly they are, what they are used for, and how they work.

Clay Wardell
  • 14,846
  • 13
  • 44
  • 65

5 Answers5

78

I wrote a complete book about it! It's also available for free on the web: http://use-the-index-luke.com/

I try to answer your questions shortly—which is not exactly what I'm good at. The last time I tried, I ended up writing a book...

Like tables, indexes consist of rows and columns but store the data in a logically sorted manner to improve search performance. Think of it like a telephone book (a printed one). They are usually sorted last_name, first_name and potentially other criteria (e.g. zip code). This sorting makes it possible to find all entries for a specific last name quickly. If you know the first name too, you can even find the entries for the combination last name/first name very quickly.

If you just know the first name, however, the telephone book does not really help you. The very same is true for multi-column database indexes. So yes, an index can potentially improve search performance. If you have the wrong index for your question (e.g. a phonebook when searching by first name) they might be useless.

You can have many indexes on the same table but on different columns. So, an index on last_name,first_name is different from an index on first_name only (which you would need to optimize searches by first name).

Indexes hold redundant data (ex: clustered indexes = telephone book). They have the same information as stored in the table (ex: function based indexes), but in a sorted manner. This redundancy is automatically maintained by the database for each write operation you perform (insert/update/delete). Consequently, indexed decrease write performance.

Besides finding data quickly, indexes can also be used to optimize sort operations (order by) and physically arrange related data closely together (clustering).

To get a better idea, look at the full table of contents of my book: http://use-the-index-luke.com/sql/table-of-contents

alex
  • 479,566
  • 201
  • 878
  • 984
Markus Winand
  • 8,371
  • 1
  • 35
  • 44
  • 5
    You completely neglected to mention that sometimes indexes are used not for performance but to enforce uniqueness across multiple columns. – horseyguy Sep 06 '14 at 12:58
11

Think of it as a table of contents for tables. If it's there, the database knows where to look more specific. If it ain't there, the database has to search through all the data to find it.

A way more detailed explanation can be found here in this Wikipedia article.

Bjoern
  • 15,934
  • 4
  • 43
  • 48
9

A database index is a datastructure aimed at improving the time complexity of lookup operation.

Lookup with no index is in worst case O(N) complexity. Efficient lookup with index enables logarithmic O(log(N)) or even with some architechture O(1) complexity.

A database index also make it possible to enforce DB constraints. Many DB systems set a index on a set of columns referred to as PRIMARY KEY. Some DB systems requires columns in a FOREIGN KEY to be indexed, so as to speed up operations (insert, update).

kiriloff
  • 25,609
  • 37
  • 148
  • 229
  • Non-trivial lookup can never be truly O(1). At best it just hides an O(log(N)) process "under" a heavyweight O(1) process, up to some point. – Hot Licks May 20 '13 at 18:22
  • O(1) is achieved with hash index in case of an equality (WHERE) query with queried field the indexed field, does this sounds correct ? – kiriloff May 21 '13 at 09:50
  • however : from postrgresql doc: Hash index operations are not presently WAL-logged, so hash indexes might need to be rebuilt with REINDEX after a database crash. They are also not replicated over streaming or file-based replication. For these reasons, hash index use is presently discouraged. – kiriloff May 21 '13 at 09:55
  • Ultimately a hash is O(log(N)), since storage is O(log(N)). It's just if you fix the storage size it looks like O(1). – Hot Licks May 21 '13 at 11:29
6

An index is an optional structure, associated with a table or table cluster, that can sometimes speed data access. By creating an index on one or more columns of a table, you gain the ability in some cases to retrieve a small set of randomly distributed rows from the table. Indexes are one of many means of reducing disk I/O.

If a heap-organized table has no indexes, then the database must perform a full table scan to find a value. For example, without an index, a query of location 2700 in the hr.departments table requires the database to search every row in every table block for this value. This approach does not scale well as data volumes increase.

http://docs.oracle.com/cd/E11882_01/server.112/e10713/indexiot.htm

Pamma
  • 1,455
  • 9
  • 16
5

It has a very similar thread running here. Check, it is helpful.

I know that they potentially improve performance

Yes, it's true. But, please keep it in mind, sometimes indexing can be the reason of POOR Performance as well. Example: Index all the columns of a database will undoubtedly affect the performance badly.

Community
  • 1
  • 1
Mayukh Roy
  • 1,815
  • 3
  • 19
  • 31