0

I’m trying to understand how indexing in SQL Server can help improve the performance of a select query.

So my understanding is that a b-tree structure is used by sql server when indexing.

Below is a simple example.

Day (Primary Key)   Race Winner
1                   Dave
2                   Jill
3                   Jake
…   
199                 Jody
200                 Sam

So the day number is our primary key. In the background a structure like below is used (or something similar – just an image I found). So if wanted to query the race winner on the 50th day I can see by using the structure below it can be found quickly by doing the following,

Start at the root > next 1 – 100 > next 1 – 50 & then enters the leaf 25 – 50 where I believe it will search through the rows of the data in this leaf until it finds the 50th day. Is the value contained here 50 & a pointer to the row which contains the rest of the data on that row?

b-tree

So I can see how this example is quicker than searching the whole table. But I have been looking where I have a table (simplified) like below,

Date            ID  SEC ID  AutoID
10th Jan 2015   ABC A123    1
10th Jan 2015   ABC A344    2
10th Jan 2015   DEF A123    3
10th Jan 2015   GHJ A344    4
20th Feb 2015   ABC A123    5
20th Feb 2015   ABC A344    6
20th Feb 2015   DEF A123    7
20th Feb 2015   GHJ A344    8

So I can use all 3 columns to create a primary key (natural key) or people have mentioned using an identity column i.e. a surrogate key.

Here I get lost.

How will indexing store this data and be able to quickly retrieve it like in the first example? A key value of “10th Jane 2015 ABCA123” doesn’t really mean anything (I’m probably wrongly assuming what is going on here – I believe the index combines the three columns to create a unique value that it puts in the index table). In the first example our index value actually meant something to the data, i.e. the day number.

I also don’t understand how sql server would use the AutoID? When querying the data above I would be using the Date & ID columns in the where condition so the AutoID seems pointless?

mHelpMe
  • 6,336
  • 24
  • 75
  • 150
  • 2
    This reminds me of this http://stackoverflow.com/q/32182339/5089204 – Shnugo Aug 25 '15 at 09:03
  • Yes the answer from that question helped me make the decision to use a the three columns for my primary key. Here I'm trying to understand what is going on - i.e. how the data is stored – mHelpMe Aug 25 '15 at 09:10
  • @mxix - nonsense. A primary key is a unique constraint that doesn't allow nulls. *By default*, the primary key will also be the clustered index but that is *only* a default. There is no requirement that the PK be the clustered index. – Damien_The_Unbeliever Aug 25 '15 at 09:16

1 Answers1

2

Is the value contained here 50 & a pointer to the row which contains the rest of the data on that row?

It depends. In the clustered index for a table (there can be only one), what is stored at the leaves is the complete row's data. The clustered index is where the data is actually stored. In a non-clustered index, what's stored at the leaves is the clustered index column(s) values, so that a lookup can occur.

By default, the primary key will become the clustered index but that's only a default, so either situation can occur.

In a multi-column index, yes, effectively what's stored within the levels of the index are the values from all columns combined. This is why, with a multi-column index, the index is only effective if the leftmost n columns of the index (n <= number of columns in the index) are used for search criteria.

In your second example, if the index is defined on Date, ID and SEC ID, in that order, and you have a query with a WHERE clause of ID = 'ABC' then the index quite simply cannot be used - because the first part of each key is the Date.

Damien_The_Unbeliever
  • 234,701
  • 27
  • 340
  • 448
  • The leaf contains the complete row’s data for clustered indices. So the nodes it uses to search to find the leaf are the index values (say we were using a primary key which happened to be a passport number, the nodes would contain the passport number to help find the correct leaf and then retrieve the all the data associated with that row?) – mHelpMe Aug 25 '15 at 09:38
  • Also in a multicolumn index all the values are combined is that into one string or is it three separate values but when searching it knows the are three values to check, starting with the Date, ID & SEC ID in my case? I take it if I have a where clause where I use Date & ID (in that order & not using SEC ID) it will still be able to make use of my index? – mHelpMe Aug 25 '15 at 09:38