There is no one index that can satisfy this query. Which actually means that you're best creating two indexes, and running two queries, then UNIONing the results...
1) Create an Index on InputBegin
2) Create a separate Index on InputEnd
3) Run the following query
SELECT * FROM yourTable WHERE InputEnd < ExclusionPeriodStart
UNION ALL
SELECT * FROM yourTable WHERE InputBegin > ExclusionPeriodEnd
The first query can then use a range seek on the InputEnd index.
The second query can also then use a range seek, but on a different index.
By keeping the queries separate, the two different requirements don't interfere with each other, and the most optimal index can be used.
You also already know (by understanding your data) that there is no overlap in the results (no record can start before it finishes, so no record can appear in both queries). This means that UNION ALL
can be used instead of the slower UNION
.
As far as I know, there is no way to perform this query faster than this. (On 5m records, it's probably faster to just scan the whole table on small data-sets.)
EDIT: That answer assumes that you're trying to find all records that do not appear inside a fixed range. If you want to check every record against every other record, then you need a different approach...
Checking for every overlap is expensive. Also, if you have these four ranges, working out which to delete is impossible...
1 -->--> 4
3 -->--> 6
5 -->--> 8
7 -->--> 9
Should you delete ranges 1 and 3, or 2 and 4?
What you can do, is find all the ranges that have another range overlapping with them.
And what you don't want is to find that A overlaps with B, and B overlaps with A.
SELECT
*
FROM
yourTable AS first_range
INNER JOIN
yourTable AS second_range
ON second_range.start_date >= first_range.start_date
AND second_range.start_date <= first_range.end_date
This will necesarrily scan the whole table for first_range. But because you only check the start_date of the second range, it will be able to use a range seek on the start_date index for any collisions.
EDIT2: Or perhaps you need the opposite of the first answer?
If you want all ranges that do collide with a set range, a modification of the same approach works.
SELECT * FROM yourTable WHERE InputEnd >= ExclusionPeriodStart
INTERSECT
SELECT * FROM yourTable WHERE InputBegin <= ExclusionPeriodEnd
This may not, however, be great. You'll take a percentage of the table in query1, and intersect it with almost all of the rest of the table. Instead you can fall back on the simple approach, but then add an optimisation...
SELECT
*
FROM
yourTable
WHERE
InputStart <= ExclusionPeriodEnd
AND InputEnd >= ExclusionPeriodStart
The first condition in the WHERE clause can be resolved with a range seek, and then scan all the resultant records to test for the second condition. So, can we reduce the range that needs scanning (currently (start of table) -> (ExclusionPeriodEnd))
.
We can if we know one extra piece of information : The maximum length of any one range...
SELECT
*
FROM
yourTable
WHERE
InputStart <= ExclusionPeriodEnd
AND InputStart >= ExclusionPeriodStart - (maximumLength)
AND InputEnd >= ExclusionPeriodStart
Now the first TWO conditions form a range seek, and give a much smaller set of data to scan for the last condition.
How do you know the max lnegth though? You could scan the whole table, but that's a self defeating attempt at an optimisation.
Instead, you could index a calcuated field; a calculation that gives the max length of the range. SELECT MAX(calculatedField) FROM yourTable
then avoids scanning the whole table. Or you could keep track with a trigger. Which is fine for INSERTS, but a bit messier when you have a DELETE (If you delete the longest range, do you scan the whole table again to find the new longest range? Probably not, you could be tempted to keep the old max length instead).