Why SQL query takes too long time to execute when joining two tables and one table has lot of rows

Question

SELECT object_oid,
       parent_oid,
       object_type,
       filetype,
       token
FROM TemporaryFiles
WHERE token NOT IN (SELECT DISTINCT
                           TemporaryFiles.token
                    FROM TemporaryFiles
                         INNER JOIN Tokens ON TemporaryFiles.token = Tokens.token);

TemporaryFiles Table - Table has around ~5,000 Rows

----------------------------------------------------
Column Name     DataType
OBJECT_OID      int 
PARENT_OID      int 
OBJECT_TYPE     varchar(100)    
FileType        varchar(255)    
Token           varchar(255)    
DBKEY           int    ------------------------Primary Key

This table has one clustered index on Primary Key DBKey

Tokens Table - Table has around ~900,000 Rows

--------------------------------------------------------------
Column Name     DataType
OBJECT_OID      int
PARENT_OID      int
OBJECT_TYPE     nvarchar(64)
CHILD_TYPE      nvarchar(64)
FileType        nvarchar(32)
Token           nvarchar(255)
CreationTime    datetime
Contextual      bit
DBKEY           int      ------------------------Primary Key

There is one clustered index on Primary Key on DBKey And one non clustered index on OBJECT_OID ,OBJECT_TYPE,Contextual,

I tried to create one more non clustered index on Token for both tables. Still the query is taking infinite time to execute.

Even i tried to change "Not in" to "NOT EXIST" statement. But still issue i see.

FYI a `DISTINCT` with a `NOT IN` is overly going to cause unneeded overhead. — Thom A, Aug 18 '20 at 12:57
you are joining and filtering on Token but you don't have an index on that column. Then it 's normal that it will take time. — Wouter, Aug 18 '20 at 13:10
Your current Subquery can be replaced by `NOT IN (SELECT TemporaryFiles.token FROM TemporaryFiles)` (if both tokens are NOT NULL), but the optimizer should be smart enough to create the same plan in both cases. — dnoeth, Aug 18 '20 at 14:05
One more thing is this same query works fine in MSSql -2008 R2, But in MSSql 2012 it's taking more time. — AyrtonSenna, Aug 28 '20 at 05:17

score 1 · Answer 1 · answered Aug 18 '20 at 13:08

Might I suggest you try the following. Your DISTINCT is really not necessary and just adding a lot of extra work:

SELECT object_oid,
       parent_oid,
       object_type,
       filetype,
       token
FROM TemporaryFiles
WHERE NOT EXISTS (SELECT 1 FROM Tokens AS T WHERE T.Token = TemporaryFiles.Token);

score 0 · Answer 2 · answered Aug 18 '20 at 13:02

Maybe try using left join to get all temp files without tokens

    SELECT object_oid,
       parent_oid,
       object_type,
       filetype,
       token
FROM TemporaryFiles a
     left join Tokens ON TemporaryFiles.token = Tokens.token
where Tokens.token is null

Index wise: maybe test using nonclusted on token and the includes for the columns used in the select (why see explanation here) Indexing/Include

Why SQL query takes too long time to execute when joining two tables and one table has lot of rows

2 Answers2