I'm working on a very large table (approximately 2.7 million rows added per day), which has the following structure:
CREATE TABLE [dbo].[Result](
[ResultDate] [date] NOT NULL,
[Thing1Id] [int] NOT NULL,
[Num] [int] NOT NULL,
[Thing2Id] [int] NOT NULL,
CONSTRAINT [PK_Result] PRIMARY KEY CLUSTERED
(
[ResultDate] ASC,
[Thing1Id] ASC,
[Num] ASC
))
As the clustered primary key is on ResultDate, Thing1Id and Num, I would expect the following query to be optimal:
SELECT Thing2.*
FROM dbo.Result
INNER JOIN Thing2 ON Thing2.Id = result.Thing2Id
WHERE
ResultDate >= '2012-01-01'
AND
ResultDate <= '2012-01-30'
AND Thing1Id = 23
As you can see, the query is finding results in Jan-12 for a particular Thing1.
However, the execution plan indicates that a huge performance increase can be gained by adding the following index:
CREATE NONCLUSTERED INDEX [IX_Missing]
ON [dbo].[Result] ([Thing1Id],[ResultDate])
INCLUDE ([Num],[Thing2Id])
And surely enough, adding this index does improve the performance massively.
Can someone please explain why? As far as I'm concerned the results should be narrowed sufficiently using the clustered primary key, and adding this will make the index size much larger and add unnecessary overhead.
Can I index the table differently to get better performance?
(Please note, in reality the table is actually 2 tables unioned, data is shifted from one to the other daily, and the data is partitioned monthly).