4

As I know, heap tables are tables without clustered index and has no physical order. I have a heap table "scan" with 120k rows and I am using this select:

SELECT id FROM scan

If I create a non-clustered index for the column "id", I get 223 physical reads. If I remove the non-clustered index and alter the table to make "id" my primary key (and so my clustered index), I get 515 physical reads.

If the clustered index table is something like this picture:

enter image description here

Why Clustered Index Scans workw like the table scan? (or worse in case of retrieving all rows). Why it is not using the "clustered index table" that has less blocks and already has the ID that I need?

Mucida
  • 603
  • 9
  • 24

4 Answers4

5

SQL Server indices are b-trees. A non-clustered index just contains the indexed columns, with the leaf nodes of the b-tree being pointers to the approprate data page. A clustered index is different: its leaf nodes are the data page itself and the clustered index's b-tree becomes the backing store for the table itself; the heap ceases to exist for the table.

Your non-clustered index contains a single, presumably integer column. It's a small, compact index to start with. Your query select id from scan has a covering index: the query can be satisfied just by examining the index, which is what is happening. If, however, your query included columns not in the index, assuming the optimizer elected to use the non-clustered index, an additional lookup would be required to fetch the data pages required, either from the clustering index or from the heap.

To understand what's going on, you need to examine the execution plan selected by the optimizer:

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
  • Got it. My mistake was thinking that the b-tree for non-clustered and clustered are equivalent. – Mucida Nov 28 '14 at 18:21
  • 1
    @Mucida: the b-tree (the navigation tree) for both types of indexes *is* roughly equivalent - what's significantly different are the **leaf level** nodes in the tree... – marc_s Nov 28 '14 at 19:24
  • @marc_s, but when I have a Clsutered and A Non Clsutered index at same time in the table, the leafs for the non-clustered b-tree change and includ the clustered index? They seem to run fast when I select these two indexed columns, even the execution plan using non-clustered index scan – Mucida Nov 28 '14 at 19:31
  • 1
    @Mucida: yes, that's what I was trying to explain: the **navigation tree** for both index types is roughly the same - a b-tree with the index columns. The **leaf level** is very different: the actual **data pages** for the clustered index, and the nonclustered index contains the index columns, plus the clustered index columns (if a clustered index exists - or the RID = row identifier if a clustered index doesn't exist) – marc_s Nov 28 '14 at 19:33
2

A clustered index generally is about as big as the same data in a heap would be (assuming the same page fullness). It should use just a little more reads than a heap would use because of additional B-tree levels.

A CI cannot be smaller than a heap would be. I don't see why you would think that. Most of the size of a partition (be it a heap or a tree) is in the data.

Note, that less physical reads does not necessarily translate to a query being faster. Random IO can be 100x slower than sequential IO.

usr
  • 168,620
  • 35
  • 240
  • 369
  • The problem is that the Clustered Index Scan is using the same number of physical reads that the table scan. Why Non-Clustered Index Scan is running so faster than Clustered Insdex scan in this case? – Mucida Nov 28 '14 at 18:11
  • I explained that in the 1st paragraph. What exactly did you not understand? Why do you think the CI scan should be faster? – usr Nov 28 '14 at 18:14
  • I think that is my doubt: Does the picture really represents a Clustered Index (with pointer to the rest of rows)? Or just the Non-Clustered Index is like that? If the last one is correct, I can understand why NCI worked better than CI – Mucida Nov 28 '14 at 18:17
  • 1
    The CI contains all columns inline. The NCI contains only those columns that you specify plus all CI key columns. – usr Nov 28 '14 at 18:52
1

When to use Clustered Index-

Query Considerations:
1) Return a range of values by using operators such as BETWEEN, >, >=, <, and <= 2) Return large result sets
3) Use JOIN clauses; typically these are foreign key columns
4) Use ORDER BY, or GROUP BY clauses. An index on the columns specified in the ORDER BY or GROUP BY clause may remove the need for the Database Engine to sort the data, because the rows are already sorted. This improves query performance.

Column Considerations : Consider columns that have one or more of the following attributes: 1) Are unique or contain many distinct values 2) Defined as IDENTITY because the column is guaranteed to be unique within the table 3) Used frequently to sort the data retrieved from a table

Clustered indexes are not a good choice for the following attributes: 1) Columns that undergo frequent changes 2) Wide keys

When to use Nonclustered Index-

Query Considerations:
1) Use JOIN or GROUP BY clauses. Create multiple nonclustered indexes on columns involved in join and grouping operations, and a clustered index on any foreign key columns.
2) Queries that do not return large result sets
3) Contain columns frequently involved in search conditions of a query, such as WHERE clause, that return exact matches

Column Considerations :
Consider columns that have one or more of the following attributes:
1) Cover the query. For more information, see Index with Included Columns
2) Lots of distinct values, such as a combination of last name and first name, if a clustered index is used for other columns
3) Used frequently to sort the data retrieved from a table

Database Considerations:
1) Databases or tables with low update requirements, but large volumes of data can benefit from many nonclustered indexes to improve query performance.
2) Online Transaction Processing applications and databases that contain heavily updated tables should avoid over-indexing. Additionally, indexes should be narrow, that is, with as few columns as possible.

ManJan
  • 3,939
  • 3
  • 20
  • 22
0

Try running

DBCC DROPCLEANBUFFERS

Before the queries...

If you really want to compare them. Physical reads don't mean the same as logical reads when optimizing a query

Sparky
  • 14,967
  • 2
  • 31
  • 45
  • 1
    Given that his table seems to be only 4MB in size I think we can assume that he did this. If not he would not have received a single physical read. – usr Nov 28 '14 at 18:10
  • Good point usr, but I am not sure if the cache data for a table is maintained after an alter table command.... – Sparky Nov 28 '14 at 18:12
  • Does the picture really represents a Clustered Index (with pointer to the rest of rows)? Or just the Non-Clustered Index is like that? If the last one is correct, I can understand why NCI worked better than CI – Mucida Nov 28 '14 at 18:16