33

I am maintaining a pretty sizable application and database and am noticing some poor database performance in a few of our stored procedures.

I always hear that "adding an index" can be done to help performance. I am certainly no DBA, and I do not understand what indexes are, why they help, and how to create them.

I basically need an indexes 101.

Can anyone give me resources so that I can learn?

scunliffe
  • 62,582
  • 25
  • 126
  • 161
mmattax
  • 27,172
  • 41
  • 116
  • 149

9 Answers9

34

As a rule of thumb, indexes should be on any fields that you use in joins or where clauses (if they have enough different values to make using an index worthwhile, field with only a few possible values doesn't benefit from an index which is why it is pointless to try to index a bit field).

If your structure has formally created primary keys (which it should, I never create a table without a primary key), those are by definition indexed becasue a primary key is required to have a unique index on it. People often forget that they have to index the foreign keys becasue an index is not automatically created when you set up the foreign key relationsship. Since the purpose of a foreign key is to give you a field to join on, most foreign keys should probably be indexed.

Indexes once created need to be maintained. If you have a lot of data change activity, they can get fragmented and slow performance and need to be refreshed. Read in Books online about indexes. You can also find the syntax for the create index statement there.

Indexes are a balancing act, every index you add usually will add time to data inserts, updates and deletes but can potentially speed up selects and joins in complex inserts, updates and deletes. There is no one formula for what are the best indexes although the rule of thumb above is a good place to start.

HLGEM
  • 94,695
  • 15
  • 113
  • 186
  • 3
    It *could* be worthwhile indexing a bit column if the value you're searching for is going to be sparse in your index. If you have a million rows and only a few hundred have a bit value of 1 and that's what you're searching for then an index is going to help. It's not the *possible values* it's the *actual values* stored in your database that are relevant to whether the index is useful. In contrast: If you have a varchar(100) column but all the actual values are the same, indexes are useless on that column. – Dave Hilditch Apr 30 '15 at 15:46
25

Think of an index similar to a card catalog in the library. An index keeps you from having to search through every isle or shelf for a book. Instead, you may be able to find the items you want from a commonly used field, such as and ID, Name, etc. When you build an index the database basically creates something separate that a query could hit to rather than scanning the entire table. You speed up the query by allowing it to search a smaller subset of data, or an optimized set of data.

CodeRot
  • 893
  • 6
  • 12
6

Indexes is a method which database systems use to quickly find data. The real world analogy are indexes in books. If an author/publisher does a good job at indexing their book, it becomes pretty easy for the reader to directly go to the page they want to read simply by looking at the index. Same goes for a database. If an index is created on a field, the database pre-sorts the data. When a request is made on the data, the database uses the index to identify which location the data is stored in on the hard disk, and directly goes there. If there are no indexes, the database needs to look at every record in order to find out if it meets the criteria(s) of your query.

A simple way to look at indexes is by thinking of a deck of cards. A database which is not indexed is like a deck a cards which have been shuffled. If you want to find the king of spades, you need to look at every card one by one to find it. You might be lucky and it can be the first one, or you might be unlucky and it can be the last one.

A database which is indexed, has all the cards in the deck ordered from ace to king and each suite is set aside in its own pile. Looking for the king of spades is much simpler now because you simply need to look at the bottom of the pile of cards which contains the spades.

I hope this helps. Be warned though that although indexes are necessary in a relational database system, they can counter productive if you write too many of them. There's a ton of great articles on the web that you can read up on indexes. I'd suggest doing some reading before you dive into them.

user2013
  • 9,231
  • 5
  • 21
  • 8
5

An index basically sorts your data on the given columns and then stores that order, so when you want to find an item, the database can optimize by using binary search (or some other optimized way of searching), rather than looking at each individual row.

Thus, if the amount of data you are searching through is large, you will absolutely want to add some indexes.

Most databases have a tool to explain how your query will work (for db2, it's db2expln, something similar probably for sqlserver), and a tool to suggest indexes and other optimizations (db2advis for db2, again probably something similar for sqlserver).

Mike Stone
  • 44,224
  • 30
  • 113
  • 140
  • Most SQL databases implement some variant of "EXPLAIN " to output a query plan. In PostgreSQL, "EXPLAIN ANALYZE " additionally runs the query and times each part. – Nick Johnson Sep 19 '08 at 20:49
3

As previously stated, you can have a clustered index and multiple non-clustered indexes. In SQL 2005, you can also add additional columns to a non-clustered index, which can improve performance where a few commonly retrieved columns are included with the index but not part of the key, which eliminates a trip to the table altogether.

Your #1 tool for determining what your SQL Server database is doing is the profiler. You can profile entire workloads and then see what indexes it recommends. You can also look at execution plans to see what effects an index has.

The too-many indexes problem is due to writing into a database, and having to update all the indexes which would have a record for that row. If you're having read performance, it's probably not because of too many indexes, but too few, or too unsuitable.

Cade Roux
  • 88,164
  • 40
  • 182
  • 265
1

An index can be explained as a sorted list of the items in a register. It is very quick to lookup the position of the item in the register, by looking for it's key in the index. Next the the key in the index is a pointer to the position in the register where the rest of the record can be found.

You can have many indexes on a register, but the more you have, the slower inserting new records will be (because each index needs a new record as well - in a sorted order, which also adds time).

Ludvig A. Norin
  • 5,115
  • 7
  • 30
  • 34
1

Indices are created in an existing table to locate rows more quickly and efficiently. It is possible to create an index on one or more columns of a table, and each index is given a name. The users cannot see the indexes, they are just used to speed up queries.

Basically, your DBMS will create some sort of tree structure which points to the data (from one column) in a sorted manner. This way it is easier to search for data on that column(s).

http://en.wikipedia.org/wiki/Index_(database)

Ali
  • 261,656
  • 265
  • 575
  • 769
1

Some more index information!

Clustered indexes are the actual physical layout of the records in the table. Hence, you can only have one per table.

Nonclustered indexes are the aforementioned card catalog. Sure, the books are arranged in a particular order, but you can arrange the cards in the catalog by book size, or maybe by number of pages, or by alphabetical last name.

Something to think about -- creating too many indexes is a common pitfall. Every time your data gets updated your DB has to seek through that index and update it, inserting a record into every index on that table for that new row. In transactional systems (think: NYSE's stock transactions!) that could be an application killer.

nathaniel
  • 1,135
  • 2
  • 8
  • 20
0

for mssql (and maybe others) the syntax looks like:

create index <indexname> on <tablename>(<column1>[,<column2>...])
TrevorD
  • 544
  • 3
  • 4