I my scenario, I have posts, which are grouped in categories. For an overview list of categories I want to display a summary of the top 10 posts with the categories (as opposed to the detail view of a category, that displays the full data). The top 10 posts are determined by a score, which comes from another table (actually an indexed view - but that doesn't matter here).
The table structure is the following:
CREATE TABLE [dbo].[Categories]
(
[Id] INT NOT NULL IDENTITY CONSTRAINT [PK_Categories] PRIMARY KEY,
[Key] CHAR(10) CONSTRAINT [UK_Categories_Key] UNIQUE,
[Caption] NVARCHAR(500) NOT NULL,
[Description] NVARCHAR(4000) NULL
)
GO
CREATE TABLE [dbo].[Posts]
(
[Id] INT NOT NULL IDENTITY CONSTRAINT [PK_Posts] PRIMARY KEY,
[CategoryId] INT NOT NULL CONSTRAINT [FK_Posts_Category] FOREIGN KEY REFERENCES [dbo].[Categories] ([Id]),
[Key] CHAR(10) CONSTRAINT [UK_Post_Key] UNIQUE,
[Text] NVARCHAR(4000) NULL,
[SummaryText] AS
CASE WHEN LEN([Text]) <= 400
THEN CAST([Text] AS NVARCHAR(400))
ELSE CAST(SUBSTRING([Text], 0, 399) + NCHAR(8230) AS NVARCHAR(400)) --First 399 characters and ellipsis
END
PERSISTED
)
GO
CREATE TABLE [dbo].[Scores] (
[Id] INT NOT NULL IDENTITY CONSTRAINT [PK_Scores] PRIMARY KEY,
[CategoryId] INT NOT NULL CONSTRAINT [FK_Scores_Category] FOREIGN KEY REFERENCES [dbo].[Categories] ([Id]),
[PostId] INT NOT NULL CONSTRAINT [FK_Scores_Post] FOREIGN KEY REFERENCES [dbo].[Posts] ([Id]),
[Value] INT NOT NULL
)
GO
CREATE INDEX [IX_Scores_CategoryId_Value_PostId]
ON [dbo].[Scores] ([CategoryId], [Value] DESC, [PostId])
GO
I can now use a view to get the top ten posts of each category:
CREATE VIEW [dbo].[TopPosts]
AS
SELECT c.Id AS [CategoryId], cp.PostId, p.[Key], p.SummaryText, cp.Value AS [Score]
FROM [dbo].[Categories] c
CROSS APPLY (
SELECT TOP 10 s.PostId, s.Value
FROM [dbo].[Scores] s
WHERE s.CategoryId = c.Id
ORDER BY s.Value DESC
) AS cp
INNER JOIN [dbo].[Posts] p ON cp.PostId = p.Id
I understand that the CROSS APPLY
will use the covering index IX_Scores_CategoryId_Value_PostId
, because it contains the category ID (for the WHERE
) the value (for the ORDER BY
and the SELECT
) and the post ID (for the SELECT
) and thus will be reasonably fast.
The question is now: what about the INNER JOIN
? The join predicate uses the post ID, which is the key of the Post
table's clustered index (the primary key). When I create a covering index that includes all the fields of the SELECT
(see below), can I significantly increase query performance (with a better execution plan, reduced I/O, index caching etc.), even though accessing the clustered index is already a pretty fast operation?
The covering index would look like this:
CREATE INDEX [IX_Posts_Covering]
ON [dbo].[Posts] ([Id], [Key], [SummaryText])
GO
UPDATE:
Since the direction of my question doesn't seem entirely clear, let me put down my thoughts in more detail. I am wondering if the covering index (or index with included columns) could be faster for the following reasons (and the performance gain woul be worth it):
- Hard drive access. The second index would be considerably smaller than the clustered index, SQL Server would have to go through less pages on the HD, which would yield better read performance. Is that correct and would you see the difference?
- Memory consumption. To load the data into the memory, I assume SQL Server would have to load the entire row into memory and then pick the columns it needs. Wouldn't that increase memory consumption?
- CPU. My assumption is that you wouldn't see a measurable difference in CPU usage, since extracting the row from the columns is not per se a CPU operation. Correct?
- Caching. My understanding is that you won't see much difference in caching, because SQL Server would only cache the data it returns, not the entire row. Or am I wrong?
These are basically (more or less educated) assumptions. I would appreciate it a lot if someone could enlighten me about this admittedly very specific issue.