0

I need to select records from a table with 5 million records based on a date range. It is taking 19 seconds to return results. Only the ID field is a primary key and the date field is not indexed.

The selected records counts is only 500. What can be done to optimise the query?

 select * 
 from productdetail 
 where createddate >= '01 Jan 2016' 
      and createddate <= '05 mar 2016'

Any suggestions/help appreciated.

Tanner
  • 22,205
  • 9
  • 65
  • 83
Joe
  • 13
  • 4
  • 5
    Is there a specific reason that prevents you from indexing `createddate`? – Branko Dimitrijevic May 27 '16 at 07:55
  • You may want to use your search skills and learn more about query optimalization, indexes, partitioning and generally about how to handle large tables. Your question with the provided information is way too broad to provide a good answer. Each and every database server is a bit different (considering HW, settings, other app running, infrastructure). We can suggest solutions, but you have to do tour own research to find the best one. – Pred May 27 '16 at 08:05
  • 2
    Only 19 seconds to search 5 million records without index!?! That's not bad at all. – jarlh May 27 '16 at 08:52

1 Answers1

0

Generally, there are no better approaches, searching for a non-indexed column must perform linear search. But in your case, if the primary key is auto increment and createddate represents the record creation date (which means greater primary key, later createddate date/time record), you may manually perform a binary search on the primary key.

Below is an example, with ID is an auto increment primary key (indexed) starting from 0. Each selection statement involved ID only so the query is fast. Please note that this may work only if the createddate is sorted relative to the primary key ID:

DECLARE @lowerBoundID INT
DECLARE @upperBoundID INT
DECLARE @maxID INT
SELECT @maxID = MAX(ID) FROM productdetail
SET @lowerBoundID = 0
SET @upperBoundID = @maxID

IF (SELECT createddate FROM productdetail WHERE ID = @lowerBoundID) > '05 mar 2016' OR (SELECT createddate FROM productdetail WHERE ID = @upperBoundID) < '01 Jan 2016'
BEGIN
    DECLARE @first INT
    DECLARE @last INT
    DECLARE @middle INT

    SET @first = 0
    SET @last = @maxID
    SET @middle = (@first + @last) / 2
    WHILE @last - @first > 1
    BEGIN
        IF (SELECT createddate FROM productdetail WHERE ID = @middle) < '01 Jan 2016'
            SET @first = @middle
        ELSE
            SET @last = @middle

        SET @middle = (@first + @last) / 2
    END

    IF (SELECT createddate FROM productdetail WHERE ID = @first) < '01 Jan 2016'
        SET @lowerBoundID = @last
    ELSE
        SET @lowerBoundID = @first


    SET @first = 0
    SET @last = @maxID
    SET @middle = (@first + @last) / 2
    WHILE @last - @first > 1
    BEGIN
        IF (SELECT createddate FROM productdetail WHERE ID = @middle) > '05 mar 2016'
            SET @last = @middle
        ELSE
            SET @first = @middle

        SET @middle = (@first + @last) / 2
    END

    IF (SELECT createddate FROM productdetail WHERE ID = @last) > '05 mar 2016'
        SET @upperBoundID = @first
    ELSE
        SET @upperBoundID = @last

    SELECT * FROM productdetail WHERE ID >= @lowerBoundID AND ID <= @upperBoundID
END
Mango Wong
  • 629
  • 6
  • 16
  • 2
    what do you mean of binary search? how can we do binary search in SELECT statement? – FLICKER May 27 '16 at 08:04
  • There is a big assumption that `createdate` must be order the same way as the primary key. If so, can use a `T-SQL` `While` statement to perform a binary search on the primary key to find the boundary of the primary key values which `createdate` is within the period. The resulting primary key range is also that for `createdate`. – Mango Wong May 27 '16 at 08:28
  • @MangoWong Could you please elaborate? Binary search works on an array in memory (you need **random access** to "jump" around and halve the range in each iteration). How would you do a binary search without loading the whole table in memory (and defeating the reason for doing it in the first place)? – Branko Dimitrijevic May 27 '16 at 08:43
  • @BrankoDimitrijevic If the primary key is auto increment by 1, it can be used as an index just like searching in array (query by indexed column approaches O(logN) operation). Added a T-SQL example. – Mango Wong May 27 '16 at 10:01
  • @MangoWong cool, but... how many of it do you have on a real production server? For fun - yes, well done. – Ivan Starostin May 27 '16 at 10:21
  • @IvanStarostin You are right, his may work only if `createddate` is by `getdate()` in the insertion. Well, I was not planning to explain this with detailed example when I said it out:p – Mango Wong May 27 '16 at 10:31
  • @MangoWong Thanks, I see what you are doing (+1). Not exactly a "binary search" by the classical definition, but the basic idea is the same. Have you had a chance to benchmark it? I suspect it would become faster than linear search for larger number of elements (assuming more-less uniform distribution of PK values, which is reasonable to expect for auto-incremented field)... Can you give us some sense where the "break-even point" is, where your algorithm becomes faster than linear search? Thousands of rows, millions of rows, billions of rows? – Branko Dimitrijevic May 27 '16 at 10:44