16

Possible Duplicates:
Difference between clustered and nonclustered index
What do Clustered and Non clustered index actually mean?

Hi experts,

What does the word "clustered" mean in "clustered index"? I am doubting that it has something to do with the disk sector usage. Because I vaguely remember that Windows organizes disk space into clusters, which is composed of one or more 512-byte sectors. Do these 2 concepts have any connections?

Thanks.

Community
  • 1
  • 1
smwikipedia
  • 61,609
  • 92
  • 309
  • 482
  • Take a look at http://stackoverflow.com/questions/1251636/what-do-clustered-and-non-clustered-index-actually-mean – Marek Karbarz Mar 10 '11 at 02:16
  • 5
    Seems to me he/she is asking about why it's CALLED clustered (i.e. why it's named that way) as opposed to what it is. – Phil Sandler Mar 10 '11 at 03:27
  • @Phil Sandler, yes, you're right. I really want to know the reason for the naming. – smwikipedia Mar 11 '11 at 01:56
  • 1
    @smwikipedia I had same question and closest answer was 'clustered index is clustering the actual data (as opposed to just the index keys in case of non-clustured index)'. Original answer is [here](http://goo.gl/ZdgMXz) – Roman Gudkov Sep 30 '14 at 09:33

2 Answers2

10

A clustered index represents the physical order of the records on disk. Nonclustered indices are merely "pointers" to the physical records in the table; they are in order of their key(s) and contain the data of their keys and any included columns.

Consider the index of a book vs. its page numbers: the index contains an alphabetized list of topics, and maybe it contains a summary of the topic, but the topics themselves are on the referenced pages. Page numbers, then, would be the clustered index.

It follows that you should consider choosing an immutable, monotonically increasing primary key for the clustered index so that things don't need to be rearranged when inserting and updating.

Mark Sowul
  • 10,244
  • 1
  • 45
  • 51
  • 1
    Thanks. So, *cluster = store closely*. That's kind of similar to *windows disk sector cluster* concept, but not the same. – smwikipedia Mar 10 '11 at 02:42
  • So, I am thinking there must be some kind of sorting when creating an index? – smwikipedia Mar 10 '11 at 02:46
  • 1
    No, it's not the same; a file system cluster is a quantum (smallest discrete unit) of file storage. Here we are concerned only with logical ordering; theoretically the disk pages containing the data could be in wildly different locations on the disk (though unlikely in practice as performance would suffer). – Mark Sowul Mar 10 '11 at 02:54
  • 1
    Yes, any index is sorted on its key(s). – Mark Sowul Mar 10 '11 at 02:54
  • 1
    Just some aside: I checked the Wikipedia for index related info, it turns out index is nothing but a *compact* version of the original data table which provides faster searching experience. No more and no less. – smwikipedia Mar 10 '11 at 03:01
1

Clustered means that records with similar keys are stored (for the most part) next to each other on disk. So if you have a key with just 1 integer column, the record with a value of "1" will be located next to the record with value "2". If you have multiple records, for instance questionid and answerid, then all the answers belonging to a particular question will be grouped together on disk, making it faster to access them.

Kibbee
  • 65,369
  • 27
  • 142
  • 182
  • 1
    You could have a non-clustered index on the primary key; the ordering isn't the distinguishing feature. – Mark Sowul Mar 10 '11 at 02:29
  • As you said, if I want to create a clustered index for a very big table, will it cost a lot of time to create it? Cause there'll be quite a lot of data rearrangement. – smwikipedia Mar 10 '11 at 02:30
  • But if we create a clustered-index immediately after the table is created. The following data insertion/updating will be delayed because of the index. It's a dilemma. Seems only small-dose querying can benefit from index. – smwikipedia Mar 10 '11 at 02:36
  • 1
    Depends; in the same vein you should consider choosing an immutable, monotonically increasing primary key for the clustered index so that things don't need to be rearranged when inserting and updating. Hence, you should probably set the primary key before adding data to the table so that it can be inserted in-order. I've added this advice to my answer. – Mark Sowul Mar 10 '11 at 02:36
  • That's an option indeed. – smwikipedia Mar 10 '11 at 02:37
  • 1
    Personally I find a date/time stamp to be a better clustered index candidate than an identity PK. Often queries are between date ranges so this gives you a big win. Maintenance of a non-clustered integer index is trivial and clustering PK rarely provides benefits when IDs are surrogates. – lll Mar 10 '11 at 07:00
  • I agree, but one consideration is that the clustered index is included in all nonclustered indexes (that's how they point back to the CI), so a clustered index key that takes up more space (e.g. `DATETIMEOFFSET` vs `INT`) will be multiplied into all the non-clustered indexes. And if the clustered index isn't `UNIQUE`, SQL will add a "uniqueifier" that will make it even bigger by _another_ 4 bytes. This can really add up. Further reading: https://technet.microsoft.com/en-us/library/ms190639(v=sql.105).aspx – Mark Sowul Aug 15 '17 at 20:16