2

I have some ETL code running on SQL Server 2008 Standard. A relatively small number of rows (~50,000) are processed and loaded into a temp table. I then execute an insert query to copy over the rows which are not present in the much larger table (~1,000,000+ rows). The temp table contains the same primary key and clustered index as the destination table.

create table #NewClaims(ExtractDate datetime, SiteName nvarchar(50), SiteCd nvarchar(50), ContractTypeCd nvarchar(50), 
ClaimRateType nvarchar(50), ClaimRateTypeCd nvarchar(50), ClaimStatus nvarchar(50), ClaimStatusCd nvarchar(50), 
CreationDt datetime, StatusDt datetime, ClaimID nvarchar(50), SeqNum int, CreationUserID nvarchar(50), 
SpecialClaimInd nvarchar(50), JobSeekerID nvarchar(50), InvoiceNum nvarchar(50), JobID nvarchar(50), JobRefId int,
RecoveryReason nvarchar(50), ClaimAmount money, GSTAmount money, ApprovedAmount money, ClaimCurrencyInd nvarchar(50), 
EmployerID nvarchar(50), BaseRateType nvarchar(50), BaseRateTypeCd nvarchar(50)
constraint PK_NewClaims primary key clustered(ClaimID, ClaimStatusCD));

Here is the SQL that loads the temp table records into the target

insert into dbo.Claim(
ExtractDate, SiteName, SiteCd, ContractTypeCd, ClaimRateType, ClaimRateTypeCd, ClaimStatus, ClaimStatusCd, CreationDt, 
StatusDt, ClaimID, SeqNum, CreationUserID, SpecialClaimInd, JobSeekerID, InvoiceNum, JobID, JobSeqNum, RecoveryReason, 
ClaimAmount, GSTAmount, ApprovedAmount, ClaimCurrencyInd, EmployerID, BaseRateType, BaseRateTypeCd
)
select 
n.ExtractDate,  n.SiteName, n.SiteCd, n.ContractTypeCd, n.ClaimRateType, n.ClaimRateTypeCd, n.ClaimStatus, n.ClaimStatusCd, 
n.CreationDt, n.StatusDt, n.ClaimID, n.SeqNum, n.CreationUserID, n.SpecialClaimInd, n.JobSeekerID, n.InvoiceNum, n.JobID, 
n.JobRefId, n.RecoveryReason, n.ClaimAmount, n.GSTAmount, n.ApprovedAmount, n.ClaimCurrencyInd, n.EmployerID, n.BaseRateType, 
n.BaseRateTypeCd
from #NewClaims as n
left join dbo.Claim as c on n.ClaimID = c.ClaimID and n.ClaimStatusCd = c.ClaimStatusCd
where c.ClaimID is null

When I run this the execution plan does something unusual. It refuses to utilise the PK clustered index on the dest table and attempts to use any other index available instead. Curiously it uses this index to retreieve the ClaimID and StatusCd. If I disable the indexes on the dest table one by one, the execution plan will keep trying to use other indexes until I've disabled all of them except the clustered, at which point it finally gives in and uses it, but produces a bunch of bitmap operations. The query runs faster when this occurs.

I've also experimented with adding an index hint: with (index(1)). This hint causes it to work as I expect, using the index and running much faster than the non hint version. The forced index seek shows scalar operators in the execution plan - does this indicate a problem?

Seek Keys[1]: Prefix: [ESD4].[dbo].[Claim].ClaimID, [ESD4].[dbo].[Claim].ClaimStatusCd = Scalar Operator([tempdb].[dbo].[#NewClaims].[ClaimID] as [n].[ClaimID]), Scalar Operator([tempdb].[dbo].[#NewClaims].[ClaimStatusCd] as [n].[ClaimStatusCd])

Is there something I'm missing? I don't like having to force SQL to use a specific execution plan, as index hints can often backfire.

Update

  • Recomputing the table statistics didn't help
  • Changing from left join to "where not exists" doesn't affect the choice of index used, but it did change the plan slightly. Instead of the hash match (Left Join), it's now doing Hash Match (Left Anti Semi Join), which is presumably faster.
Trent
  • 1,089
  • 3
  • 12
  • 24
  • Are your statistics up to date on your destination table? – Aushin Apr 18 '13 at 04:10
  • Not sure. How can I check or refresh them? – Trent Apr 18 '13 at 04:21
  • 1
    You can update them by running this on your table http://msdn.microsoft.com/en-us/library/ms187348.aspx – Aushin Apr 18 '13 at 04:31
  • 3
    Of course it uses any other index, because other indexes take less space on disk (= less IO) and they all contain PK/clustered index data anyway. Not that this will always be optimal, but optimizer values IO saving. – Arvo Apr 18 '13 at 08:29
  • 1
    Can you please copy the Actual Execution Plans, one with and without index hint? Also the Create table and indexes for Claim table if they are different than #NewClaims – Afroz Apr 22 '13 at 19:29
  • I believe Arvo's hit the nail on the head – iruvar May 17 '13 at 19:22
  • Yes, Arvo's right, it does this because normally it is *faster* to do it that way. The only real mystery here is why it would be faster when you force it to use the PK/Clustered index? To answer that we'd need to see the (Actual) Execution Plans for both in order to compare. – RBarryYoung May 17 '13 at 20:18

1 Answers1

0

It's difficult to say without more information, but you can test this. Sometimes it may be faster to get an output of the PK IDs that you want to insert into a table variable (@PKIDs), then in the update statement, use 'select ... from #newClaims inner join @PKIDs'. It's difficult to say without the plan and having the dataset, but maybe in your situation it could help. If you inner join, you can use an index hint for the PK. The #NewClaims table may also only need a PK on ClaimID and ClaimStatusCd, reducing the time to build #NewClaims.

declare @PKIDs table (ClaimID INT PRIMARY KEY)

insert into @PKIDs (ClaimID)
select distinct n.ClaimID
from #NewClaims as n
where not exists (select 1 from dbo.Claim as c where c.ClaimID = n.ClaimID and n.ClaimStatusCd = c.ClaimStatusCd)

insert into dbo.Claim(
ExtractDate, SiteName, SiteCd, ContractTypeCd, ClaimRateType, ClaimRateTypeCd, ClaimStatus, ClaimStatusCd, CreationDt, 
StatusDt, ClaimID, SeqNum, CreationUserID, SpecialClaimInd, JobSeekerID, InvoiceNum, JobID, JobSeqNum, RecoveryReason, 
ClaimAmount, GSTAmount, ApprovedAmount, ClaimCurrencyInd, EmployerID, BaseRateType, BaseRateTypeCd
)
select 
n.ExtractDate,  n.SiteName, n.SiteCd, n.ContractTypeCd, n.ClaimRateType, n.ClaimRateTypeCd, n.ClaimStatus, n.ClaimStatusCd, 
n.CreationDt, n.StatusDt, n.ClaimID, n.SeqNum, n.CreationUserID, n.SpecialClaimInd, n.JobSeekerID, n.InvoiceNum, n.JobID, 
n.JobRefId, n.RecoveryReason, n.ClaimAmount, n.GSTAmount, n.ApprovedAmount, n.ClaimCurrencyInd, n.EmployerID, n.BaseRateType, 
n.BaseRateTypeCd
from #NewClaims as n
join @PKIDs as pkids on pkids.ClaimID = n.ClaimID
tommy_o
  • 3,640
  • 3
  • 29
  • 33