0

I have a list of about 200,000 records with EntityID column which I load into a temp table variable.

I want to insert any records from the Temp table variable if EntityID from the Temp table does not exist in the dbo.EntityRows table. The dbo.EntityRows table contains about 800,000 records.

The process is very slow compared to when the dbo.EntityRows table had about 500,000 records.

My first guess is because of the NOT EXISTS clause, each row from the Temp variable must scan the entire 800k rows of the dbo.EntityRows table to determine if it exists or not.

QUESTION: Are there alternative ways to run this comparison check without using the NOT EXISTS, which incurs a hefty cost and will only get worse as dbo.EntityRows continues to grow?

EDIT: Appreciate the comments. Here is the query (I left out the part after the IF NOT EXISTS check. After that, if NOT EXISTS, I insert into 4 tables).

declare @EntityCount int, @Counter int, @ExistsCounter int, @AddedCounter int
declare @LogID int
declare @YdataInsertedEntityID int, @YdataSearchParametersID int
declare @CurrentEntityID int
declare @CurrentName nvarchar(80)
declare @CurrentSearchParametersID int, @CurrentSearchParametersIDAlreadyDone int 
declare @Entities table 
(
    Id int identity,
    EntityID int,
    NameID nvarchar(80), 
    SearchParametersID int
)

insert into @Entities
select EntityID, NameID, SearchParametersID from YdataArvixe.dbo.Entity     order by entityid;


set @EntityCount = (select count(*) from @Entities);
set @Counter = 1;
set @LogID = null;
set @ExistsCounter = 0;
set @AddedCounter = 0;
set @CurrentSearchParametersIDAlreadyDone = -1;

While (@EntityCount >= @Counter)
begin
    set @CurrentEntityID = (select EntityID from @Entities
                                where id = @Counter)

    set @CurrentName = (select nameid from @Entities
                                    where id = @Counter);

    set @CurrentSearchParametersID = (select SearchParametersID from @Entities
                                            where id = @Counter)

    if not exists (select 1 from ydata.dbo.entity
                    where NameID = @CurrentName)
    begin
       -- I insert into 4 tables IF NOT EXISTS = true
    end
nanonerd
  • 1,964
  • 6
  • 23
  • 49
  • 4
    Generally, [`NOT EXISTS` is the fastest method](http://sqlperformance.com/2012/12/t-sql-queries/left-anti-semi-join). You may want to add index to speed up your query. – Felix Pamittan Jan 14 '16 at 06:56
  • 1
    Can you post your query. – Felix Pamittan Jan 14 '16 at 07:10
  • 1
    Could you replace the `table-variable` with `temporary-table`? Depending on your instance settings, the server could choose to use a parallel plan and optimize the query. The `table-variable` can not be part of such plan, as the server "thinks" it has only one row. – gotqn Jan 14 '16 at 07:16
  • 1
    Post your query, include your [actual execution plan](http://stackoverflow.com/a/7359705/243373). It is quite possible the execution plan will already give you a suggestion that you are missing an INDEX (which I believe will be the case). GL. – TT. Jan 14 '16 at 07:17
  • 1
    Also what gotqn said is spot on. Generally table variables have poor performance. (Local) temporary tables perform a lot better (in technical terms, will produce a better execution plan). – TT. Jan 14 '16 at 07:18
  • You might get some different performance by using MERGE but yeah, he general case is that temp tables get the cardinality really wrong if it's not 1 or some other small number. Do you really need a temp? – LoztInSpace Jan 14 '16 at 08:40
  • Could you try interect or except if both tables are of same format – TheGameiswar Jan 14 '16 at 09:19

3 Answers3

0

I am not sure but there are following ways how we can check

(SELECT COUNT(er.EntityID) FROM dbo.EntityRows er WHERE er.EntityID = EntityID) <> 0

(SELECT er.EntityID FROM dbo.EntityRows er WHERE er.EntityID = EntityID) IS NOT NULL

EntityID NOT EXISTS  (SELECT er.EntityID FROM dbo.EntityRows er)

EntityID NOT IN (SELECT er.EntityID FROM dbo.EntityRows er)

But as per my belief getting count will give good performance. Also index will help to improve performance as 'Felix Pamittan' said

D Mayuri
  • 456
  • 2
  • 6
  • 2
    Getting the count might give you **horribly bad** performance - since it needs to go over **all the rows** to determine the count - when you really only want to know whether a given value exists or not ...... `NOT EXISTS` will **stop** looping once it finds a matching value .... – marc_s Jan 14 '16 at 07:41
0

As @gotqn said, start by using a temporary table. Create an index on EntityID after the table is filled. If you don't have an index on EntityID in EntityRows, create one.

I do things like this a lot, and I generally use the following pattern:

INSERT INTO EntityRows (
    EntityId, ...
)

SELECT T.EntityId, ...
FROM #tempTable T
LEFT JOIN EntityRows E
ON T.EntityID = E.EntityID
WHERE E.EntityID IS NULL

Please comment if you'd like further info.

user5151179
  • 575
  • 2
  • 10
  • I should have used the Temp Table since the rows involved are not small. I will add an index on the Temp Table. Will post results, thanks ! – nanonerd Jan 15 '16 at 05:15
0

Well, the answer was pretty basic. @Felix and @TT had the right suggestion. Thanks!

I put a non-clustered index on the NameID field in ydata.dbo.entity.

if not exists (select 1 from ydata.dbo.entity
                    where NameID = @CurrentName)

So it can now process the NOT EXISTS part quickly using the index instead of scanning the entire dbo.entity table. It is moving fast again.

nanonerd
  • 1,964
  • 6
  • 23
  • 49