How is sequential traversing between intermediate leaves of clustered index used?

Question

Intermediate leaves of clustered index are linked sequentially (next, previous) for faster access (between intermediate nodes) [1], [2], etc.:

alt text

How this access is used?
Why is it needed?

[1]
Clustered Index Structures
http://msdn.microsoft.com/en-us/library/ms177443.aspx
[2]
Clustered Tables vs Heap Tables
http://www.mssqltips.com/tip.asp?tip=1254

Update: Follow-up question for answerers:

What are the helpful cases of Internals Viewer for SQL Server to convince me to use it?

It is not me voting your every question down. Someone is following you and smacking you every time. — PerformanceDBA, Oct 30 '10 at 08:41
Thanks, this is a relief to know because I started to be suspicious. I understand that when a guru cannot say anything by copy&pasting google results, the fault is in bad question or in bad questioner — Gennady Vanin Геннадий Ванин, Oct 30 '10 at 09:01
Er, no. That is nonsense, for many reasons. For one, the web is full of rubbish, not serious technical info. Two, a guru can answer directly without having to reference even good texts. Although they may not wish to type great heaps of info, every time. — PerformanceDBA, Oct 30 '10 at 09:14
@vgv8: I don't follow you around and vote down but I do think you're spamming with all these questions. It makes no difference to how you use SQL Server for most cases until you have high complexity or volumes — gbn, Oct 30 '10 at 09:43
@PerformanceDBA, What info (or contribution) do anonymous downvotes bring to others (if to ignore me and the fact that I do not get the smallest clue what they are supposed to convey and how can I improve my questioning skills and become a better contributor) from these opinions? — Gennady Vanin Геннадий Ванин, Oct 30 '10 at 09:47
@gbn, we have have opposite definitions of spamming. Most FAQ questions, repeated in SO dozens of times and massively upvoted have easy answers by simple googling. It is not about how to use, and not about SQL Server only (the context of which was given just as example). It is about how to understand and how to study the basic fundamentals blurred by docs and contradictory internet articles, blogs, posts — Gennady Vanin Геннадий Ванин, Oct 30 '10 at 10:00
None. Zero. It is anti-social. I have a stalker following me, who down-votes my every post, and I have seen the pattern; so it was obvious when I saw the same pattern happening to you. maybe it is my stalker who is jealous because I ignore her, and I give you so much of my attention. — PerformanceDBA, Oct 30 '10 at 10:01
@vgv8: It's all described in the official MSDN docs. How can this be contradictory or out of date? — gbn, Oct 30 '10 at 10:05
@gbn, It is not even about correct answers (or questions) given by others, it is about learning to acquire techniques to get answers myself. Plz give me links to msdn docs, I could not find descriptions related to this question — Gennady Vanin Геннадий Ванин, Oct 30 '10 at 10:10
@gbn: Anyone relying on MS or MSDN docs, is going to get a few nasty surprises re their accuracy, quality and currency. Just query the sys% tables yourself. I have provided specifics, re exactly why the fluffy nice-looking diagrams are technically inaccurate; and fall apart as evidenced when answering technical questions (such as vgv8's). I am happy to answer specific points; generalities such as yours are impossible to answer. — PerformanceDBA, Oct 30 '10 at 20:48

PerformanceDBA · Accepted Answer · 2012-07-09T09:22:29.110

The clustered index (and not non-clustered indices) can be used for range queries. Do you know what that is ? Horizontal traversal of the B-Tree enhances the speed of navigating the CI when determining qualified rows during range queries.

In a more general sense, if the server cache is too small, and the CI pages get paged out, when any query (not only the range queries) need to get the next page when walking down, or sideways, through a CI, it can get the page with a single disk access, because the pages are linked by a pointer; ie. it avoids walking back up one level to find the next page). Just one of the many reasons CIs are much faster than NCIs; they are far more enhanced because the NCI depends on them (your other question today).

The diagram has mistakes (contains false info), or to put it more precisely, it is a descriptive, non-technical diagram, from a non-technical corporation:

The intermediate levels have a single pointer to the page at the next level (not multiple pointers).
The leaf level IS the data row. There are no pointers to rows (at the intermediate OR leaf level).
The Index Pages do not resemble a page of text and images. Each Index Page contains hundreds of index B-Tree entries.
The Root page is different only in that the first entry is the single root to the index ; it contains hundreds of entries which are of course second level, and may be third level, etc.

There is a reason technicians draw, and use, technical drawings: among others, it avoids misunderstandings and confusion. No questions re the Diagram I Made for You ?

Response to Martin Smith's Post

a. Me: The clustered index (and not non-clustered indices) can be used for range queries

MS: Incorrect: Non-clustered indices can be used perfectly well for range queries as long as the Non Clustered Index is covering.

It appears you understand a Covered Query, but you do not understand a Range Query. Please read up on it. It is unfortunately named "query", but actually it is a performance technique that all the SQL vendors provide. Say you have a real Relational table, which means a composite key, eg. Invoice PK is (CustomerId, InvoiceNo) [not InvoiceId]. Then a query such as:

SELECT * FROM Invoice WHERE CustomerId = @CustomerId

will navigate the B-Tree of the ClusteredIndex once, to find the fist Invoice for the CustomerId. It will then follow the PageChain of the LeafLevel (data rows) to obtain the second and subsequent Invoice for the CustomerId. There is no further use of the B-Tree for the query. The Range Query ends when the first Invoice with CustomerId > 1 is encountered.

That is only possible with a ClusteredIndex, where the B-Tree is married to the Data, in a single physical structure.

That is physically impossible with a NonClusteredIndex-plus-Data (which is a Heap or a ClusteredIndex). Which is why Range Queries cannot be supported for NCIs. Even if you had an NCI with (CustomerId, InvoiceNo), the data rows will not be in that order; they will be in chronological order in the Heap; so the query that uses that NCI will extract one-row-per-NCI-entry.

b. Me: CIs are much faster than NCIs; they are far more enhanced because the NCI depends on them

MS: The B tree structure of a clustered index is no different from a non clustered index. CIs are not enhanced or somehow have a different and superior structure ...

No dispute there. You have simply misunderstood me, re speed, I was talking about the table overall (NonClusteredIndices cannot exist on their own). Let me clarify: Given the same Key, a ClusteredIndex (which includes data) is always much faster than a NonClusteredIndex-plus-Heap. Navigating, maintaining, creating, deleting from, a single data-storage structure (the CI), is obviously much faster than doing the same to two data-storage structures (NCI+Heap).

It is not physically possible to make two DSs faster than one DS (assuming with the same key.)

c. Not worth a response. It appears you do not realise that my comments pertain to the incorrect diagrams. Put another way, your comments (and proof) are also quite correct.

Thanks for your time butchering technical diagram for me. Though I'd have preferred going with mainstream publicly available docs/refs. I look every day into it, but just now I urgently deviated to reading your reply on EAV http://stackoverflow.com/questions/4011956/multiple-fixed-tables-vs-flexible-abstract-tables/4013207#4013207 — Gennady Vanin Геннадий Ванин, Oct 30 '10 at 10:35
Perhaps you missed the last para above. If the mainstrean docs were good enough people would not waste time creating technically accurate docs. Perhaps you've missed the point that, the mainstream doc you posted leads to confusion; which is what caused you to post; which is what I have answered; and had to move away from such docs in doing so. the problem is that some people take MS as gospel; it isn't. I am happy to answer specific questions; I won't waste time responding to the mountain of MS rubbish. — PerformanceDBA, Oct 30 '10 at 20:58
Hi, I shall ask questions later and probably not in this board. I am shifting to sqlservercentral.com/Forums/Forum373-1.aspx Thanks, once again for your time! I had already here over a dozen of bans without any warning for such, irritating local gurus, questions) or for having asked why had I been banned before (and am still banned in meta. stackoverlow.com) — Gennady Vanin Геннадий Ванин, Nov 01 '10 at 08:19
My pleasure, it was good while it lasted, eh? Sorry to hear that. Have fun. — PerformanceDBA, Nov 01 '10 at 09:49
I disagree that the diagram contains any innacuracies. It seems perfectly satisfactory to me. — Martin Smith, Nov 21 '10 at 14:35
@Martin: do you have anything *specific* to say about the *specific* errors [1] to [4] that I have pointed out ? Also, have a look at vgv8's answer below. — PerformanceDBA, Nov 25 '10 at 16:40

score 1 · Answer 2 · answered Nov 10 '10 at 06:47

First , as pointed out PerfomanceDBA, in order to understand SQL Server internals it is better to use Sybase documentation and terminology.

Second, a good explanation where, why and how intermediate level consecutive transversing is explained in [1]:

"Read-Ahead in a key-ordered scan

With key-ordered scans, the engine uses the information stored in the intermediate index pages 1 level above the leaf level to schedule serial read-aheads for pages that contain the keys found. If a request is made for example for all keys from 1 to 100, the engine will first read the index page above the leaf page for key 1 (on it's way to traversing to the leaf page); however, instead of simply reading each leaf page in sequence from page 1 to page 100, the engine scans the intermediate level page and builds a list of the leaf pages that must be read to get pages 1 thru 100, then schedules all reads in key order - in addition, the engine will recognize if pages are contiguous and perform a single read to retrieve the adjacent pages in a single operation as opposed to multiple, smaller operations. A similar type of operation is used to pre-fetch data from a base cluster or heap when scanning through a non-clustered index - since the leaf rows of a non-clustered index contain pointers to the data rows in the cluster/heap structure, as the storage engine reads through the leaf of the non-clustered index, it also starts scheduling asynchronous reads for the corresponding data rows whose pointers have already been retrieved. This allows the engine to fetch data efficiently from the underlying cluster/heap before it has completed the scan of the non-clustered index.

The navigation for a read-ahead in a key-ordered scan would look something like the following:

alt text
"

[1]
Chad Boyd
MSSQLTips - SQL Server Blog
Fragmentation Station - Stop #1 - Storage basics and Access Methods http://blogs.mssqltips.com/blogs/chadboyd/archive/2007/11/12/fragmentation-station-stop-1-storage-basics-and-access-methods.aspx

+1 this is the correct answer. Intermediate page link is required exlusively for read ahead. Range scans (any scan actually) navigate the leaf level link chain and never have go back up to the intermediate level. — Remus Rusanu, Jul 09 '12 at 09:47
@RemusRusanu No. You have found evidence for one use, correct. But then deciding that it is an exclusive use, is completely unreasonable. There are many other uses. Check your monitoring (metrics and output) for evidence. Think about Partitioned tables, etc. — PerformanceDBA, Oct 13 '21 at 10:43

Martin Smith · Answer 3 · 2010-11-21T23:07:26.017

The diagram you have in your question represents indexes in Microsoft SQL Server perfectly accurately.

To address some of the aspects of PerformanceDBA's answer that I believe to be incorrect or inadequately explained.

"The clustered index (and not non-clustered indices) can be used for range queries"

Incorrect: Non-clustered indices can be used perfectly well for range queries as long as the Non Clustered Index is covering.

"CIs are much faster than NCIs; they are far more enhanced because the NCI depends on them"

The B tree structure of a clustered index is no different from a non clustered index. CIs are not enhanced or somehow have a different and superior structure. If anything NCIs are slightly enhanced in that they don't always have a NULL_BITMAP and the "Status Bits B" byte and are thus may be slightly more compact.

"The intermediate levels have a single pointer to the page at the next level (not multiple pointers) ... There are no pointers to rows (at the intermediate OR leaf level)."

USE tempdb

IF OBJECT_ID('testing') IS NULL
BEGIN
    CREATE TABLE testing
    (
    a INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
    b INT NOT NULL,
    c CHAR(4000) NOT NULL DEFAULT REPLICATE('c',4000),
    d CHAR(4000) NOT NULL DEFAULT REPLICATE('d',4000)
    )

    CREATE UNIQUE NONCLUSTERED  INDEX ix ON testing (b) INCLUDE (d)

    INSERT INTO testing (b)
    SELECT TOP 3000 ROW_NUMBER() OVER (ORDER BY (SELECT 0))
    FROM sys.all_columns s1, sys.all_columns s2
END

IF OBJECT_ID('index_pages') IS NULL
BEGIN
CREATE TABLE index_pages
(
PageFID TINYINT,
PagePID INT,
IAMFID TINYINT,
IAMPID INT,
ObjectID INT,
IndexID TINYINT,
PartitionNumber TINYINT,
PartitionID BIGINT,
iam_chain_type VARCHAR(30),
PageType TINYINT,
IndexLevel TINYINT,
NextPageFID TINYINT,
NextPagePID INT,
PrevPageFID TINYINT,
PrevPagePID INT,
PRIMARY KEY (PageFID, PagePID)
) 
END
ELSE
TRUNCATE TABLE index_pages

INSERT INTO index_pages
EXEC('DBCC IND(tempdb, testing, 2)') 

SELECT * 
FROM index_pages 
ORDER BY IndexLevel DESC

You will see that the level one (intermediate level) pages have horizontal pointers denoted by the NextPagePID and PrevPagePID columns. In addition to these page level pointers each index entry has a pointer to a page in the next level down as indicated correctly in the diagram.

To see this select one of the PagePIDs belonging to a level one page and look at that page in Internals Viewer for SQL Server. You will see (as below) that each index record has its own Down Page Pointer.

In the particular individual record selected in the screen shot below it can be seen that it is showing that the first record on the leaf page 1:186 will have a key value of 13 or later.

How is sequential traversing between intermediate leaves of clustered index used?

3 Answers3