20

Is there a way I can improve this kind of SQL query performance:

INSERT
INTO ...
WHERE NOT EXISTS(Validation...)

The problem is when I have many data in my table (like million of rows), the execution of the WHERE NOT EXISTS clause if very slow. I have to do this verification because I can't insert duplicated data.

I use SQLServer 2005

thx

Vladimir Sachek
  • 1,126
  • 1
  • 7
  • 20
Melursus
  • 10,328
  • 19
  • 69
  • 103
  • Are you inserting from one table into another? – achinda99 Feb 16 '09 at 20:50
  • Yes but they are not formatted the same way. For example in table A the date is an integer 20070102 and in table B the table is a datetime – Melursus Feb 16 '09 at 21:34
  • usually if you can do NOT EXISTS then there is likely a way to swap to using EXISTS ... may need an extra temp table but it's possible to be quicker – whytheq Aug 15 '19 at 13:06

7 Answers7

14

Make sure you are searching on indexed columns, with no manipulation of the data within those columns (like substring etc.)

cjk
  • 45,739
  • 9
  • 81
  • 112
13

Off the top of my head, you could try something like:

 TRUNCATE temptable
 INSERT INTO temptable ...
 INSERT INTO temptable ... 
 ...
 INSERT INTO realtable
 SELECT temptable.* FROM temptable
 LEFT JOIN realtable on realtable.key = temptable.key
 WHERE realtable.key is null
Blorgbeard
  • 101,031
  • 48
  • 228
  • 272
  • 2
    In my tests in a table with millions of rows this has been the fastest solution by far... OP's option lasts several minutes, accepted (ck's) answer lasts several minutes, your solution lasts 10 seconds (and I'm using the correct indexes in all solutions) – David Espart Aug 02 '11 at 16:54
6

Try to replace the NOT EXISTS with a left outer join, it sometimes performs better in large data sets.

Otávio Décio
  • 73,752
  • 17
  • 161
  • 228
  • Funny, I've more often found the opposite. EXISTS will stop searching at the first match found, whereas a join makes all possible matches. Thus EXISTS ought to be faster. I think. – sfuqua Feb 16 '09 at 23:20
  • The thing is, NOT EXISTS will always cause a table scan whereas if you are careful with your join you might be working solely with indexes. – Otávio Décio Feb 16 '09 at 23:31
  • 2
    ávio, I don't think it's true that NOT EXISTS will always cause a table scan. See this post: http://social.msdn.microsoft.com/Forums/en-US/transactsql/thread/582544fb-beda-46c0-befd-4b28b5c2cdee/ – John M Gant Nov 11 '10 at 14:26
2

Outer Apply tends to work for me...

instead of:

from t1
where not exists (select 1 from t2 where t1.something=t2.something)

I'll use:

from t1
outer apply (
    select top 1 1 as found from t2 where t1.something=t2.something
) t2f
where t2f.found is null
b_levitt
  • 7,059
  • 2
  • 41
  • 56
1

Pay attention to the other answer regarding indexing. NOT EXISTS is typically quite fast if you have good indexes.

But I have had performance issues with statements like you describe. One method I've used to get around that is to use a temp table for the candidate values, perform a DELETE FROM ... WHERE EXISTS (...), and then blindly INSERT the remainder. Inside a transaction, of course, to avoid race conditions. Splitting up the queries sometimes allows the optimizer to do its job without getting confused.

dwc
  • 24,196
  • 7
  • 44
  • 55
0

If you can at all reduce your problem space, then you'll gain heaps of performance. Are you absolutely sure that every one of those rows in that table needs to be checked?

The other thing you might want to try is a DELETE InsertTable FROM InsertTable INNER JOIN ExistingTable ON <Validation criteria> before your insert. However, your mileage may vary

hova
  • 2,811
  • 20
  • 19
0
insert into customers 
select * 
from newcustomers 
where customerid not in (select customerid 
                         from customers)

..may be more efficient. As others have said, make sure you've got indexes on any lookup fields.

C B
  • 1,677
  • 6
  • 18
  • 20
SqlACID
  • 4,024
  • 20
  • 28