In SQL Server (and most other relational databases), a "Composite Index" is an index with multiple keys. Let's say we have this query that gets run a lot, and we want to create a covering index for this query to speed it up;
SELECT a, b FROM MyTable WHERE c = @val1 AND d = @val2
These are all possible composite indexes that would cover this query;
CREATE INDEX ix1 ON MyTable (c, d, a, b)
CREATE INDEX ix2 ON MyTable (c, d) INCLUDE (a, b)
CREATE INDEX ix3 ON MyTable (d) INCLUDE (a, b, c)
CREATE INDEX ix4 ON MyTable (c) INCLUDE (a, b, d)
But apparently, they don't perform equally. According to Erlan Sommarskog (Microsoft MVP), the first two are faster than the 3rd and 4th, and the 4th is faster than the 3rd.
He goes on to explain;
ix2 is the "best" index, because a and b will not take up space in the higher levels of the index tree. Also, if a or b are updated, in ix2 there can be no page splits or similar as the index tree is unaffected.
However, I am having a hard time grasping what exactly is going on. I do have the general knowledge on b-tree indexes and how they work, but I don't understand the logic behind composite keys. For example;
CREATE INDEX ix1 ON MyTable (c, d, a, b)
Does the order of the columns here matter? If so, why? Also;
CREATE INDEX ix2 ON MyTable (c, d) INCLUDE (a, b)
What is the difference between this composite key and the one above? I don't understand what difference "INCLUDE" makes.
Note: I know there are a lot of posts on Composite Keys, but I believe my last two questions are specific enough to not be a duplicate.