I have some ETL code running on SQL Server 2008 Standard. A relatively small number of rows (~50,000) are processed and loaded into a temp table. I then execute an insert query to copy over the rows which are not present in the much larger table (~1,000,000+ rows). The temp table contains the same primary key and clustered index as the destination table.
create table #NewClaims(ExtractDate datetime, SiteName nvarchar(50), SiteCd nvarchar(50), ContractTypeCd nvarchar(50),
ClaimRateType nvarchar(50), ClaimRateTypeCd nvarchar(50), ClaimStatus nvarchar(50), ClaimStatusCd nvarchar(50),
CreationDt datetime, StatusDt datetime, ClaimID nvarchar(50), SeqNum int, CreationUserID nvarchar(50),
SpecialClaimInd nvarchar(50), JobSeekerID nvarchar(50), InvoiceNum nvarchar(50), JobID nvarchar(50), JobRefId int,
RecoveryReason nvarchar(50), ClaimAmount money, GSTAmount money, ApprovedAmount money, ClaimCurrencyInd nvarchar(50),
EmployerID nvarchar(50), BaseRateType nvarchar(50), BaseRateTypeCd nvarchar(50)
constraint PK_NewClaims primary key clustered(ClaimID, ClaimStatusCD));
Here is the SQL that loads the temp table records into the target
insert into dbo.Claim(
ExtractDate, SiteName, SiteCd, ContractTypeCd, ClaimRateType, ClaimRateTypeCd, ClaimStatus, ClaimStatusCd, CreationDt,
StatusDt, ClaimID, SeqNum, CreationUserID, SpecialClaimInd, JobSeekerID, InvoiceNum, JobID, JobSeqNum, RecoveryReason,
ClaimAmount, GSTAmount, ApprovedAmount, ClaimCurrencyInd, EmployerID, BaseRateType, BaseRateTypeCd
)
select
n.ExtractDate, n.SiteName, n.SiteCd, n.ContractTypeCd, n.ClaimRateType, n.ClaimRateTypeCd, n.ClaimStatus, n.ClaimStatusCd,
n.CreationDt, n.StatusDt, n.ClaimID, n.SeqNum, n.CreationUserID, n.SpecialClaimInd, n.JobSeekerID, n.InvoiceNum, n.JobID,
n.JobRefId, n.RecoveryReason, n.ClaimAmount, n.GSTAmount, n.ApprovedAmount, n.ClaimCurrencyInd, n.EmployerID, n.BaseRateType,
n.BaseRateTypeCd
from #NewClaims as n
left join dbo.Claim as c on n.ClaimID = c.ClaimID and n.ClaimStatusCd = c.ClaimStatusCd
where c.ClaimID is null
When I run this the execution plan does something unusual. It refuses to utilise the PK clustered index on the dest table and attempts to use any other index available instead. Curiously it uses this index to retreieve the ClaimID and StatusCd. If I disable the indexes on the dest table one by one, the execution plan will keep trying to use other indexes until I've disabled all of them except the clustered, at which point it finally gives in and uses it, but produces a bunch of bitmap operations. The query runs faster when this occurs.
I've also experimented with adding an index hint: with (index(1)). This hint causes it to work as I expect, using the index and running much faster than the non hint version. The forced index seek shows scalar operators in the execution plan - does this indicate a problem?
Seek Keys[1]: Prefix: [ESD4].[dbo].[Claim].ClaimID, [ESD4].[dbo].[Claim].ClaimStatusCd = Scalar Operator([tempdb].[dbo].[#NewClaims].[ClaimID] as [n].[ClaimID]), Scalar Operator([tempdb].[dbo].[#NewClaims].[ClaimStatusCd] as [n].[ClaimStatusCd])
Is there something I'm missing? I don't like having to force SQL to use a specific execution plan, as index hints can often backfire.
Update
- Recomputing the table statistics didn't help
- Changing from left join to "where not exists" doesn't affect the choice of index used, but it did change the plan slightly. Instead of the hash match (Left Join), it's now doing Hash Match (Left Anti Semi Join), which is presumably faster.