What is an index in SQL?

Question

Also, when is it appropriate to use one?

score 439 · Accepted Answer · edited Jan 09 '23 at 08:24

An index is used to speed up searching in the database. MySQL has some good documentation on the subject (which is relevant for other SQL servers as well): http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

An index can be used to efficiently find all rows matching some column in your query and then walk through only that subset of the table to find exact matches. If you don't have indexes on any column in the WHERE clause, the SQL server has to walk through the whole table and check every row to see if it matches, which may be a slow operation on big tables.

The index can also be a UNIQUE index, which means that you cannot have duplicate values in that column, or a PRIMARY KEY which in some storage engines defines where in the database file the value is stored.

In MySQL you can use EXPLAIN in front of your SELECT statement to see if your query will make use of any index. This is a good start for troubleshooting performance problems. Read more here: http://dev.mysql.com/doc/refman/5.0/en/explain.html

"The index can also be a UNIQUE index..." Just to confirm with you, does that means index can al be Non-UNIQUE? I always thought its UNIQUE. I'm quite new to SQL, pardon me — Daniel Kurniadi, May 16 '20 at 05:01
@DanielKurniadi Most indexes are not unique. I might have a user database and want to query for everyone that lives in Gothenburg. An index on the "city" field would speed up my query. But there are more than 1 user living in Gothenburg so the index must be non-unique. — Emil Vikström, May 17 '20 at 13:50

score 189 · Answer 2 · edited Dec 24 '22 at 20:37

A clustered index is like the contents of a phone book. You can open the book at 'Hilditch, David' and find all the information for all of the 'Hilditch's right next to each other. Here the keys for the clustered index are (lastname, firstname).

This makes clustered indexes great for retrieving lots of data based on range based queries since all the data is located next to each other.

Since the clustered index is actually related to how the data is stored, there is only one of them possible per table (although you can cheat to simulate multiple clustered indexes).

A non-clustered index is different in that you can have many of them and they then point at the data in the clustered index. You could have e.g. a non-clustered index at the back of a phone book which is keyed on (town, address)

Imagine if you had to search through the phone book for all the people who live in 'London' - with only the clustered index you would have to search every single item in the phone book since the key on the clustered index is on (lastname, firstname) and as a result the people living in London are scattered randomly throughout the index.

If you have a non-clustered index on (town) then these queries can be performed much more quickly.

These two videos ([Clustered vs. Nonclustered Index Structures in SQL Server](https://www.youtube.com/watch?v=ITcOiLSfVJQ&t=139s&ab_channel=VoluntaryDBA) and [Database Design 39 - Indexes (Clustered, Nonclustered, Composite Index)](https://www.youtube.com/watch?v=EZ3jBam2IEA&ab_channel=CalebCurry) ) are very helpful for understanding what **clustered** means if you are a newbie like me, doesn't even know what index is. — Rick, Jan 25 '21 at 15:04
Clustered indexes were probably badly named - they're really just the physical ordered storage of all of the data for this table. I guess they were called 'clustered' because clustered indexes work very well for range-based queries. e.g. if your clustered index key is on DateCreated, it's very very fast to run queries like SELECT * from table where DateCreated BETWEEN 2020-01-01 and 2020-02-01. The non-clustered indexes are really just extra indexes - you have the index keys pointing to page numbers (from the clustered index) where all incidences of that data can be found. — Dave Hilditch, Mar 16 '21 at 17:01

Mitch Wheat · Answer 3 · 2013-09-15T00:50:29.113

An index is used to speed up the performance of queries. It does this by reducing the number of database data pages that have to be visited/scanned.

In SQL Server, a clustered index determines the physical order of data in a table. There can be only one clustered index per table (the clustered index IS the table). All other indexes on a table are termed non-clustered.

score 63 · Answer 4 · edited Jan 06 '19 at 14:32

63

Indexes are all about finding data quickly.

Indexes in a database are analogous to indexes that you find in a book. If a book has an index, and I ask you to find a chapter in that book, you can quickly find that with the help of the index. On the other hand, if the book does not have an index, you will have to spend more time looking for the chapter by looking at every page from the start to the end of the book.

In a similar fashion, indexes in a database can help queries find data quickly. If you are new to indexes, the following videos, can be very useful. In fact, I have learned a lot from them.

Index Basics
Clustered and Non-Clustered Indexes
Unique and Non-Unique Indexes
Advantages and disadvantages of indexes

edited Jan 06 '19 at 14:32

SandPiper

2,816
5
30
52

answered Sep 25 '12 at 12:39

Suresh

987
10
4

1

Reading all the answers made me wonder why not index everything. +1 for including the link containing the disadvantages. – lakshayg Mar 04 '18 at 19:23
1

@LakshayGarg Sometimes unnecessary indexing can also slow down the execution time of the query, so we should not try index everything. Just like everything has it's own pros and cons. – Gaurav Rajdeo Jun 27 '18 at 03:31
@LakshayGarg Gaurav Rajdeo is right. Too keep the same analogy: You may want to index every chapter or figure or table in a book but not every paragraph, sentence or word. Normally, that would be an overkill and would lead to unnecessary complication. Hope that this is close enough to the drawn picture in this answer. – colidyre Sep 30 '19 at 13:39

score 25 · Answer 5 · edited Mar 29 '17 at 18:45

25

Well in general index is a B-tree. There are two types of indexes: clustered and nonclustered.

Clustered index creates a physical order of rows (it can be only one and in most cases it is also a primary key - if you create primary key on table you create clustered index on this table also).

Nonclustered index is also a binary tree but it doesn't create a physical order of rows. So the leaf nodes of nonclustered index contain PK (if it exists) or row index.

Indexes are used to increase the speed of search. Because the complexity is of O(log N). Indexes is very large and interesting topic. I can say that creating indexes on large database is some kind of art sometimes.

edited Mar 29 '17 at 18:45

David Andrei Ned

799
1
11
28

answered Jun 02 '10 at 06:35

Voice

1,547
16
31

7

in general, its a b-tree rather than a binary tree. – Mitch Wheat Jun 02 '10 at 07:35
so, because indexes use self-balancing trees, every time you add/delete a row it will balance itself - making insert/deletion more expensive... correct? – Maverick Meerkat Jun 19 '18 at 11:38

score 23 · Answer 6 · edited Sep 20 '18 at 20:15

First we need to understand how normal (without indexing) query runs. It basically traverse each rows one by one and when it finds the data it returns. Refer the following image. (This image has been taken from this video.)

So suppose query is to find 50 , it will have to read 49 records as a linear search.

Refer the following image. (This image has been taken from this video)

When we apply indexing, the query will quickly find out the data without reading each one of them just by eliminating half of the data in each traversal like a binary search. The mysql indexes are stored as B-tree where all the data are in leaf node.

I watched the video linked in this answer. It's good and worth a read. — Rick, Jan 25 '21 at 14:45

score 22 · Answer 7 · edited Mar 29 '17 at 21:41

22

INDEXES - to find data easily

UNIQUE INDEX - duplicate values are not allowed

Syntax for INDEX

CREATE INDEX INDEX_NAME ON TABLE_NAME(COLUMN);

Syntax for UNIQUE INDEX

CREATE UNIQUE INDEX INDEX_NAME ON TABLE_NAME(COLUMN);

edited Mar 29 '17 at 21:41

CDspace

2,639
18
30
36

answered Dec 04 '13 at 04:37

RAGU

221
2
2

score 15 · Answer 8 · edited Apr 13 '21 at 14:22

INDEX is a performance optimization technique that speeds up the data retrieval process. It is a persistent data structure that is associated with a Table (or View) in order to increase performance during retrieving the data from that table (or View).

Index based search is applied more particularly when your queries include WHERE filter. Otherwise, i.e, a query without WHERE-filter selects whole data and process. Searching whole table without INDEX is called Table-scan.

You will find exact information for Sql-Indexes in clear and reliable way: follow these links:

For cocnept-wise understanding: http://dotnetauthorities.blogspot.in/2013/12/Microsoft-SQL-Server-Training-Online-Learning-Classes-INDEX-Overview-and-Optimizations.html
For implementation-wise understanding: http://dotnetauthorities.blogspot.in/2013/12/Microsoft-SQL-Server-Training-Online-Learning-Classes-INDEX-Creation-Deletetion-Optimizations.html

score 6 · Answer 9 · answered Jun 02 '10 at 06:34

If you're using SQL Server, one of the best resources is its own Books Online that comes with the install! It's the 1st place I would refer to for ANY SQL Server related topics.

If it's practical "how should I do this?" kind of questions, then StackOverflow would be a better place to ask.

Also, I haven't been back for a while but sqlservercentral.com used to be one of the top SQL Server related sites out there.

score 5 · Answer 10 · answered Jun 02 '10 at 06:29

An index is used for several different reasons. The main reason is to speed up querying so that you can get rows or sort rows faster. Another reason is to define a primary-key or unique index which will guarantee that no other columns have the same values.

Pankaj Tanwar · Answer 11 · 2021-08-24T05:50:45.450

So, How indexing actually works?

Well, first off, the database table does not reorder itself when we put index on a column to optimize the query performance.

An index is a data structure, (most commonly its B-tree {Its balanced tree, not binary tree}) that stores the value for a specific column in a table.

The major advantage of B-tree is that the data in it is sortable. Along with it, B-Tree data structure is time efficient and operations such as searching, insertion, deletion can be done in logarithmic time.

So the index would look like this -

Here for each column, it would be mapped with a database internal identifier (pointer) which points to the exact location of the row. And, now if we run the same query.

Visual Representation of the Query execution

So, indexing just cuts down the time complexity from o(n) to o(log n).

A detailed info - https://pankajtanwar.in/blog/what-is-the-sorting-algorithm-behind-order-by-query-in-mysql

score 2 · Answer 12 · answered Oct 17 '21 at 04:58

2

INDEX is not part of SQL. INDEX creates a Balanced Tree on physical level to accelerate CRUD.

SQL is a language which describe the Conceptual Level Schema and External Level Schema. SQL doesn't describe Physical Level Schema.

The statement which creates an INDEX is defined by DBMS, not by SQL standard.

answered Oct 17 '21 at 04:58

Zim

1,528
1
10
6

1

"...INDEX is not part of SQL..." +1 just for that. Indexes are unrelated to the SQL Standard and are not mentioned anywhere in it. Indexes are only implementation techniques to speed up operations. – The Impaler Jul 21 '22 at 16:48
However, there are many types of indexes (I can name 6 off the top of my head). B+ Trees are just one of them. – The Impaler Jul 21 '22 at 16:49

score 0 · Answer 13 · answered Feb 21 '20 at 17:44

An index is an on-disk structure associated with a table or view that speeds retrieval of rows from the table or view. An index contains keys built from one or more columns in the table or view. These keys are stored in a structure (B-tree) that enables SQL Server to find the row or rows associated with the key values quickly and efficiently.

Indexes are automatically created when PRIMARY KEY and UNIQUE constraints are defined on table columns. For example, when you create a table with a UNIQUE constraint, Database Engine automatically creates a nonclustered index.

If you configure a PRIMARY KEY, Database Engine automatically creates a clustered index, unless a clustered index already exists. When you try to enforce a PRIMARY KEY constraint on an existing table and a clustered index already exists on that table, SQL Server enforces the primary key using a nonclustered index.

Please refer to this for more information about indexes (clustered and non clustered): https://learn.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described?view=sql-server-ver15

Hope this helps!

What is an index in SQL?

13 Answers13

So, How indexing actually works?

Visual Representation of the Query execution

Linked

Related