I’m trying to understand how indexing in SQL Server can help improve the performance of a select query.
So my understanding is that a b-tree structure is used by sql server when indexing.
Below is a simple example.
Day (Primary Key) Race Winner
1 Dave
2 Jill
3 Jake
…
199 Jody
200 Sam
So the day number is our primary key. In the background a structure like below is used (or something similar – just an image I found). So if wanted to query the race winner on the 50th day I can see by using the structure below it can be found quickly by doing the following,
Start at the root > next 1 – 100 > next 1 – 50 & then enters the leaf 25 – 50 where I believe it will search through the rows of the data in this leaf until it finds the 50th day. Is the value contained here 50 & a pointer to the row which contains the rest of the data on that row?
So I can see how this example is quicker than searching the whole table. But I have been looking where I have a table (simplified) like below,
Date ID SEC ID AutoID
10th Jan 2015 ABC A123 1
10th Jan 2015 ABC A344 2
10th Jan 2015 DEF A123 3
10th Jan 2015 GHJ A344 4
20th Feb 2015 ABC A123 5
20th Feb 2015 ABC A344 6
20th Feb 2015 DEF A123 7
20th Feb 2015 GHJ A344 8
So I can use all 3 columns to create a primary key (natural key) or people have mentioned using an identity column i.e. a surrogate key.
Here I get lost.
How will indexing store this data and be able to quickly retrieve it like in the first example? A key value of “10th Jane 2015 ABCA123” doesn’t really mean anything (I’m probably wrongly assuming what is going on here – I believe the index combines the three columns to create a unique value that it puts in the index table). In the first example our index value actually meant something to the data, i.e. the day number.
I also don’t understand how sql server would use the AutoID? When querying the data above I would be using the Date & ID columns in the where condition so the AutoID seems pointless?