Optimize sql except query

Question

I have a query that needs to be optimized and im not sure how. This is the sql script:

declare @tempTable table(color1 int, color2 int, color3 int, color4 int, newToken uniqueidentifier default newid(), ordinal int identifier(1,1))

insert into @tempTable 
select color1, color2, color3, color4
from @colorCombination
except
select c1.color as color1, c2.color as color2, c3.color as color3, c4.color as color4
from products p
inner join attributes c1 on c1.pId = p. Id and c1.type = 'primary'
inner join attributes c2 on c2.pId = p. Id and c2.type = 'secondary'
inner join attributes c3 on c3.pId = p. Id and c3.type = 'other1'
inner join attributes c4 on c4.pId = p. Id and c4.type = 'other2'
where p.category = 'furniture'

Without the except, both select statement run really fast but with the except, it took nearly 15minutes

@colorCombination has nearly 24,000 rows

The productattributes return 11,000 rows

I used the except because the need to find those 13,000 that are not in the database yet and then insert it.

This is running on sql server 2008

Is there a better way to get the missing records instead of using except?

score 2 · Answer 1 · edited Nov 13 '18 at 23:35

The poor performance from so few rows is caused by the table variables and how they impact the execution plan.

This answer uses temporary tables with nonclustered indexes on (color1, color2, color3, color4). Your original table variables did not specify not null for any colors, so I will assume that some null values are acceptable. If they are not, you could drop the surrogate Id and use a clustered index on the four color columns. To allow for nulls, this also switchs back to except instead of not exists(). If null is not a factor, then not exists() should be a bit faster.

Reference concerning table variables and temp tables:

/* step 1: existing colors in a temp table with an index */

create table #productColors (
    id int not null identity (1,1) primary key clustered
  , color1 int , color2 int , color3 int , color4 int 
  );

insert into #productColors (color1, color2, color3, color4)
select distinct
    color1  = case when c.[type] = 'primary'   then c.color end
  , color2  = case when c.[type] = 'secondary' then c.color end
  , color3  = case when c.[type] = 'other1'    then c.color end
  , color4  = case when c.[type] = 'other2'    then c.color end
from from products p
  inner join attributes c
    on p.Id = c.Id
group by p.id;

create nonclustered index ix_productColors
  on #productColors (color1, color2, color3, color4);

/* step 2: color combinations in a temp table with an index */

create table #colorCombinations (
    id int not null identity (1,1) primary key clustered
  , color1 int , color2 int , color3 int , color4 int 
);

insert into #colorCombinations (color1, color2, color3, color4)
select distinct color1 , color2 , color3 , color4 
from @colorCombinations;

create nonclustered index ix_colorCombinations
  on #colorCombinations (color1, color2, color3, color4);

/* step 3: insert new color combinations into #tempTable */

create table #tempTable (color1 int
  , color2 int
  , color3 int
  , color4 int
  , newToken uniqueidentifier default newid()
  , ordinal int identifier(1,1)
);

insert into #tempTable(color1, color2, color3, color4)
  select color1, color2, color3, color4
  from #colorCombination
  except
  select color1, color2, color3, color4
  from #productColors

Old answer:

Using except also de-duplicates the rows in your @tempTable (which might be better off as a #temptable instead of a @TableVariable depending on how you are using it).

If you do not need to de-duplicate the rows from @tempTable, then you can use not exists() (or add distinct to the select, but it will take a performance hit):

insert into @tempTable 
select color1, color2, color3, color4
from @colorCombination cc
where not exists (
  select 1
  from products p
  inner join attributes c1 on c1.pId = p. Id and c1.type = 'primary'
  inner join attributes c2 on c2.pId = p. Id and c2.type = 'secondary'
  inner join attributes c3 on c3.pId = p. Id and c3.type = 'other1'
  inner join attributes c4 on c4.pId = p. Id and c4.type = 'other2'
  where p.category = 'furniture'
    and c1.color = cc.color1
    and c2.color = cc.color2
    and c3.color = cc.color3
    and c4.color = cc.color4
)

I'd also like to inquire as to why the temp table/table variable is used at all? Unless the query results need to be used several times, I'd eliminate that part and directly use the query... This does depend on grander purpose, though. — Frank V, Apr 05 '17 at 18:02
Thanks for the quick response! I'll try it later and let you know. Thank you so much. I have edited the script and it has the columns for the temp table. After inserting in that, i need to bulk inserts the missing rows in 3 tables — Ikraam MX, Apr 05 '17 at 18:12
i tried the not exist.. sadly it's taking 16 something minutes now — Ikraam MX, Apr 06 '17 at 09:43
@IkraamMX Updated my answer to suggest switching to temporary tables. — SqlZim, Apr 06 '17 at 16:01

Optimize sql except query

1 Answers1