Understanding clustered index

Question

Since PostgreSQL doesn't support clustered indexes, I'm considering MSSQL server. I've read the article comparing clustered and non-clustered indexes. The gist of the article is that (emphasize mine):

Non clustered indexes store both a value and a pointer to the actual row that holds that value.

And

Clustered indexes don’t need to store a pointer to the actual row because of the fact that the rows in the table are stored on disk in the same exact order as the clustered index

As I was told there and there it was very difficult to support the physical ordering of the table's data, especially if the table is splitted among multiple drives. And now, I meet the clustered index concept assuming that data stored in some order physically. This's what I was confused by.

Question: What is the clustered index structure? Does it support tree-like structure to traverse over, like PosgtreSQL does for btree indexes?

So you want to migrate from PostgreSQL to MS SQL Server just to get clustered indexes?!? — jarlh, Oct 06 '15 at 12:13
@jarlh Not exactly, I'm just trying to undersatnd that concept by `MSSQL` example. In particular, if a clustered index meas just ordering data physically (somehow), it'll be clear. But how can it I tie with btree structure and physical order. I can't imagine any way it implemented.... — St.Antario, Oct 06 '15 at 12:15
What bit's tripping you up? The idea that data can be ordered, or that the row's data can be part of the index? — Rowland Shaw, Oct 06 '15 at 12:16
@RowlandShaw The rows can be a part of the index, yes. I can't imagine how it's possible... If MSSQL also assumes underlying btree structure for clusterd indexes... — St.Antario, Oct 06 '15 at 12:17
It probably won't help to point out that SQL Server allows non-clustered indexes to include some non-indexed data too: https://msdn.microsoft.com/en-us/library/ms190806.aspx — Rowland Shaw, Oct 06 '15 at 12:19
@RowlandShaw Ah, it seems I got it. In non-clustered indexes we need to traverse the btree, to get the __pointer__ to a row, and the to get the row by that pointer. It causes drive-head moving to get the pointer first, and an additional moving to get the actual row. Moreover, the table is stored page-by-page physically which reduces head moving overhead even more. Is that correct understanding of that concept? — St.Antario, Oct 06 '15 at 12:24
Seems like an academic discussion. I would add that not all types of data should be considered a good candidate for a clustered index, like guids, and strings. You typically want continuous data, like a datetime or an auto-incrementing field. — ewahner, Oct 06 '15 at 13:00
@ewahner To me, it sounds not quite obviously... Maybe you can give some hint to understand what you're talking about? — St.Antario, Oct 06 '15 at 13:11
If lets say you were to use a GUID as a clustered index your data would look like `('F0CD53B7-5842-49B4-A685-1CEFF8EE750F', 'A3DB7FD6-19B8-4278-AEFE-949DBCD1C9A4')` and because of the nature of a clustered index the data is sorted. All subsequent inserts into that table would cause a re-org of the data due to data shifting and moving to accommodate new entries. Whereas with a datetime...all new inserts would just append to the end and no re-org would be required. This makes inserts much less costly when you choose the correct data type for your CL index. — ewahner, Oct 06 '15 at 13:31
The article you link is wrong. See http://stackoverflow.com/questions/1251636/what-do-clustered-and-non-clustered-index-actually-mean/24470091#24470091 — Martin Smith, Feb 06 '16 at 21:24

score 13 · Accepted Answer · answered Oct 06 '15 at 12:14

In SQL Server, indexes are organized as B-trees. Each page in an index B-tree is called an index node. The top node of the B-tree is called the root node. The bottom level of nodes in the index is called the leaf nodes. Any index levels between the root and the leaf nodes are collectively known as intermediate levels. In a clustered index, the leaf nodes contain the data pages of the underlying table. The root and intermediate level nodes contain index pages holding index rows. Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a data row in the leaf level of the index. The pages in each level of the index are linked in a doubly-linked list.

Clustered indexes have one row in sys.partitions, with index_id = 1 for each partition used by the index. By default, a clustered index has a single partition. When a clustered index has multiple partitions, each partition has a B-tree structure that contains the data for that specific partition. For example, if a clustered index has four partitions, there are four B-tree structures; one in each partition.

for ref.

https://technet.microsoft.com/en-us/library/ms177443(v=sql.105).aspx http://www.sqlservercentral.com/blogs/practicalsqldba/2013/03/14/sql-server-part-4-explaining-the-non-clustered-index-structure-/

So, in addititional to the phisically ordered table, we have the index-rows (Root and intermidiate-level) the actuall traversing to the physical row is performed by, right? — St.Antario, Oct 06 '15 at 12:33
So are you saying that a clustered index with only one partition does NOT have a B-Tree? — Brain2000, Feb 07 '18 at 20:25

score 0 · Answer 2 · answered Oct 06 '15 at 12:52

In clustered index there are three levels

1.Root level

2.Intermediate level

3.Leaf level

The clustered index contains the data rows at the leaf level. if you’re searching a value in the indexed column, the query engine would first look the value at the root level if the value is available at the root level then query engine will not go to intermediate level or leaf level. if the value is not founded on Root level then it will search the value in intermediate level or leaf level. if the amount of data rows is too small then there is no intermediate level available in clustered index.

the below diagram can help you to understand the basic of clustered index:

Understanding clustered index

2 Answers2