1

I have the following table with 1 billion records.

create table PfTest
(
    cola int,
    colb int,
    colc date,
    cold varchar(10),
    ID int
);

Now I want to display the records which are in a specific dates and not in specific dates.

For which I am using the following 2 types of queries:

Query 1:

select DISTINCT cola, colb, colc, cold, ID
from PfTest
WHERE colc In ('2014-01-01') 
  AND cold NOT IN (SELECT cold 
                   FROM PfTest 
                   WHERE ID = 1 
                     AND colc IN ('2014-01-02', '2014-01-03', 
                                  '2014-01-04', '2014-01-05', '2014-01-06'));

Query 2:

WITH cte AS
(
    SELECT DISTINCT cola, colb, colc, cold, ID
    FROM PfTest
    WHERE cold NOT IN (SELECT cold FROM PfTest 
                       WHERE ID = 1 
                         AND colc IN('2014-01-02', '2014-01-03',
                                     '2014-01-04', '2014-01-05', '2014-01-06'))
) 
SELECT cola, colb, colc, cold, ID
FROM cte 
WHERE colc IN ('2014-01-01');   

Above both query plans are same for execution. And both are taking huge time for execution. Can I write some better query for this situation?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
MAK
  • 6,824
  • 25
  • 74
  • 131

2 Answers2

2

This is your query, without the DISTINCT (which seems unnecessary):

select cola, colb, colc, cold, ID
from PfTest
WHERE colc In ('2014-01-01') AND 
      cold NOT IN (SELECT cold
                   from PfTest
                   WHERE ID = 1 AND
                         colc IN ('2014-01-02', '2014-01-03', '2014-01-04', '2014-01-05', '2014-01-06')
                  );

I would start with indexes. PFTest(colc, cold) and PFTest(id, colc, cold).

If the subquery returns lots of data -- like millions of rows -- then this might be a case where you are better off with a temporary table. I would try the index first. If that doesn't work, a temporary table with an index on cold might work. Also, although it will have little impact on performance, I would phrase the query using NOT EXISTS rather than NOT IN:

select cola, colb, colc, cold, ID
from PfTest t
WHERE colc In ('2014-01-01') AND 
      NOT EXISTS (SELECT 1
                  from PfTest t2
                  WHERE t2.cold = t1.cold AND t2.ID = 1 AND
                        t2.colc IN ('2014-01-02', '2014-01-03', '2014-01-04', '2014-01-05', '2014-01-06')
                 );

For this version, the best index is PfTest(cold, id, colc).

NOT EXISTS has more intuitive behavior when the matching columns have NULL values.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    If `ID` is the `PK` and also the clustering key, is the 2nd index necessary? – Felix Pamittan Jul 27 '15 at 10:54
  • @Gordon Linoff, On which column you have created primary key and clustered index? And on which columns non-clustered index? Have you used combined indexes? – MAK Jul 27 '15 at 11:04
  • @Gordon Linoff, Yes! You are right. Thank you so much. – MAK Jul 28 '15 at 04:59
1

Firstly

select DISTINCT cola, colb, colc, cold, ID
from PfTest
WHERE colc In ('2014-01-01') 
  AND cold NOT IN (SELECT cold 
                   FROM PfTest 
                   WHERE ID = 1 
                     AND colc IN ('2014-01-02', '2014-01-03', 
                                  '2014-01-04', '2014-01-05', '2014-01-06'));

is same as

select DISTINCT cola, colb, colc, cold, ID
from PfTest
WHERE colc In ('2014-01-01') 
  AND colc NOT IN ('2014-01-02', '2014-01-03', 
                                  '2014-01-04', '2014-01-05', '2014-01-06')

AND NOT(ID =1);

because the inner and outer tables are the same.

Since you don't want to re-use the tables again and again(because it contans a billion rows), having the data pulled on to a temp table is a better practice. And then creating suitable indices on it.

select cola, colb, colc, cold, ID
INTO #PfTest
FROM PfTest

CREATE NONCLUSTERED INDEX IX_PFTEST1 ON #PfTest(id) INCLUDE (cola, colb, colc, cold)
CREATE NONCLUSTERED INDEX IX_PFTEST2 ON #PfTest(colc) INCLUDE (cola, colb, id, cold)
CREATE NONCLUSTERED INDEX IX_PFTEST3 ON #PfTest(cold) INCLUDE (cola, colb, id, colc)

select cola, colb, colc, cold, ID
from #PfTest
WHERE colc In ('2014-01-01') 

INTERSECT

select cola, colb, colc, cold, id
from

(select cola, colb, colc, cold, ID
from #PfTest

EXCEPT


SELECT cola, colb, colc, cold, 1 id FROM #PfTest 
 where
 colc IN('2014-01-02', '2014-01-03',
                                 '2014-01-04', '2014-01-05', '2014-01-06'))A

Used EXCEPT in place of NOT IN to improve the performance a bit.

SouravA
  • 5,147
  • 2
  • 24
  • 49
  • On which table you have created index? On Temp table or Table? – MAK Jul 27 '15 at 11:15
  • Yup! This is perfect. Thank you so much – MAK Jul 27 '15 at 11:19
  • @SouravA The two queries you say are equivalent really are not. See this simple fiddle for an example where they return different results. If they were the same the result should be the same. http://www.sqlfiddle.com/#!3/c4cfc/1 Also, the third query is also not the same as the two others. – jpw Jul 27 '15 at 11:33
  • See this for a better example including the third query: http://www.sqlfiddle.com/#!3/c4cfc/5 – jpw Jul 27 '15 at 11:40
  • 1
    @SouravA There's a closing parentheses too much in the last query, but that aside, the three queries still return different results: http://www.sqlfiddle.com/#!3/c4cfc/8 This might not be a problem though as the OP accepted your answer :) To say that they do the same thing is incorrect in any case. – jpw Jul 27 '15 at 11:52
  • @SouravA The issue is that the original query had a predicate `AND cold NOT IN (SELECT cold ` that excluded rows based on the value in _cold_ which the later queries doesn't have. – jpw Jul 27 '15 at 13:31
  • The statement that the first two queries are equivalent strikes me as being obviously false. For instance, it is easy to construct data where the `id = 1` has data on '2014-01-01' and still meets the other condition, so it would be returned by the original query. I don't understand why the OP accepted this answer. – Gordon Linoff Jul 27 '15 at 17:15
  • @jpw and @gordonlinoff - Its embarassing, but I want to thank you guys. I was making a logical blunder! It took time for me to realize this. I had clearly confused `NOT IN` with `NOT`! – SouravA Jul 28 '15 at 03:37
  • @MAK - my answer is not correct in its current form and I am going to edit it. If that still works fine for you, then fine. Otherwise I request you to unselect this as an answer and mark the other reply as an answer. – SouravA Jul 28 '15 at 03:39