4

I've a table with two columns beginrange and endrange. No overlapping ranges should be allowed.There are indexes on these columns and we've tried many sql conditions like

inputBegin between beginRange and endRange or
inputEnd between beginRange and endRange 

not ( inputEnd < beginRange or inputStart > endRange )

etc Which work fine, except they're very slow as the table contains over 5mil records.

Is there anyway to write a much efficient overlap check?

Edit: I've thought of one more solution, oracle will count the index only when count() is done on a NOT NULL column with an index. If beginRange and endRange are NOT NULL and both have an index we can have three sums:

count(endRange) where inputBegin > endRange
+
count(beginRange) where inputEnd < beginRange
=
count(beginRange/endRange)

so with UNION ALL I would get three rows, and in the code I need to check if sum of first two equals the third. Of course I'm assuming only index will be counted and no rows will be accessed. Any other way?

Rnet
  • 4,796
  • 9
  • 47
  • 83
  • When you say there are indexes on these columns can you specify exactly what indexes they are as it sounds like you have 2 separate indexes. – Ben Apr 12 '12 at 13:22
  • @Ben They're just two non unique indexes on the columns. – Rnet Apr 12 '12 at 13:26
  • Have you tried adding an index on both columns `(begin,end)` rather then each column `(begin)` and `(end)`? – Ben Apr 12 '12 at 13:27
  • @Ben: Good hint - we have a composite index and it works well. We also partition by `begin`. OP could post the plans for us to see... – Anonymous Apr 12 '12 at 13:29
  • @Ben We had a single index, which was a functional index (hex to dec), and when we used the same function in the where condition, the explain plan showed only index scan, but the autotrace showed full table scan, so it was done away with. – Rnet Apr 12 '12 at 13:35
  • @Ben the two columns were hex values, the composite index was created with their decimal values, and in the where condition we had used the same function as used in the index creation, but the autotrace showed table full scans. After which two redundant columns were added (decimal value columns of the hex columns), the above columns are those – Rnet Apr 12 '12 at 13:39
  • (I'm not sure if a compound index would be helpful) @Rnet: I think you should ask for this question to be migrated to dba.stackexchange.com There are a few Oracle experts that I think are best fit to answer this. – ypercubeᵀᴹ Apr 12 '12 at 13:46
  • 1
    It's not a DBA question. It just needs a realisation that no one index can help both conditions. Instead have two indexes, and two queries, and UNION ALL them together. – MatBailie Apr 12 '12 at 14:01
  • @Dems: It's an advanced querying/indexing problem, so a good fit for DBA.SE (I think). Are bitmapo indexes good enough for this? Maybe a spatial index will work? (no idea really, waiting for a good answer, too) – ypercubeᵀᴹ Apr 12 '12 at 14:08
  • @Rnet - My answer has bloated in the extreme. What is it that you want to achieve? `1.` Find existing entries that collide with other entries? `2.` Find existing entries that collide with a new entry, or new specified range? I've spent a lot of time dealing with overlapping time periods, and optimising them. If you can state your functional need I can look at the specifically for you. – MatBailie Apr 12 '12 at 14:48
  • @Dems I just need to know if an overlap exists, more precisely I would be happy to know if count > 0 and not know the exact value of count also. – Rnet Apr 12 '12 at 17:46
  • My final example (with the max length predicate) is the fastest I've ever managed. It does require you to effectively build a statistic of your own, but it does work wonders. I'd bet that your max length is very small relative to the range overes by the whole table, so using it will reduce the rows being scanned massively. And you an always *force* a max length; split ranges up into smaller ranges when they're being inserted. I've gained a 500 fold boost in performance by tuning that technique. – MatBailie Apr 12 '12 at 22:28
  • What kind of the performance you are getting, what is "very slow"? Also is that the performance of detecting there is at least one conflicting row, or the performance of finding all conflicting rows? – Branko Dimitrijevic Apr 13 '12 at 01:14

4 Answers4

1

I'm not sure whether you want to:

  1. check if a row you are about to insert overlaps with some of the existing rows, or
  2. search through all the existing rows and identify those that overlap?

If (1), then what you are essentially already doing...

SELECT *
FROM YOUR_TABLE
WHERE :inputEnd > beginRange AND :inputStart < endRange;

...will give you overlaps and should be very performant, provided you have a composite index whose components are in opposite directions: {beginRange ASC, endRange DESC}.


If (2), then you can utilize windowing like this:

SELECT *
FROM (
    SELECT
        YOUR_TABLE.*,
        LEAD(beginRange) OVER (ORDER BY beginRange) nextBeginRange
    FROM YOUR_TABLE
)
WHERE endRange > nextBeginRange;

This will give you every range that overlaps with its next range (where meaning of "next" is defined in the context of beginRange ordering).

Strictly, this doesn't even need a composite index (unless you want covering) - just a simple index on {beginRange} should ensure decent performance.

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • A comment makes it clear that the Op needs your answer 1. That the composite index will help, however, I'm not certain. The first predicate will yield an indexed range seek (start of table)->(:inputEnd). The second predicate, however, can make little use of the second field in the index. Relatively few ranges will start at the same time, which makes the first field so highly selective that the second field may as well be random. Some additional optimisation is needed. – MatBailie Apr 12 '12 at 22:24
  • @Dems I saw `endRange` being used as access (as opposed just filter) predicate of index scan in the execution plan and I jumped to the conclusion that the index is optimally used, but the more I think about how index _must_ be used in this particular case, the more I think you are right. – Branko Dimitrijevic Apr 13 '12 at 10:23
1

This is an answer - if certain assertions can be made:

You have a table with beginRange and endRange columns where there ano no two existing rows with overlapping (beginRange, endRange).

You want to insert a new row with (inputStart, inputEnd) but check if it overlaps with any of the existing rows on the table.

Then you can use this condition which should be fast - with a simple index on startRange:

WHERE input_Start <
      ( SELECT endRange
        FROM
          ( SELECT endRange
                 , ROW_NUMBER() OVER(ORDER BY startRange DESC) AS rn 
            FROM tableX
            WHERE startRange < input_End
          ) tmp
        WHERE rn = 1
      )


  --- TRUE  --> Overlaps
  --- FALSE --> No overlap
ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • Surely you should have an index on `startrange,endrange` in this situation so you don't need to do access the table by `rowid` in the inner select. You would only access this index then. – Ben Apr 13 '12 at 23:41
  • @Ben: Yeah, but it would only be a one-row read - with a clever optimizer. – ypercubeᵀᴹ Apr 13 '12 at 23:46
0

There is no one index that can satisfy this query. Which actually means that you're best creating two indexes, and running two queries, then UNIONing the results...

1) Create an Index on InputBegin
2) Create a separate Index on InputEnd
3) Run the following query

SELECT * FROM yourTable WHERE InputEnd   < ExclusionPeriodStart 
UNION ALL
SELECT * FROM yourTable WHERE InputBegin > ExclusionPeriodEnd

The first query can then use a range seek on the InputEnd index. The second query can also then use a range seek, but on a different index.

By keeping the queries separate, the two different requirements don't interfere with each other, and the most optimal index can be used.

You also already know (by understanding your data) that there is no overlap in the results (no record can start before it finishes, so no record can appear in both queries). This means that UNION ALL can be used instead of the slower UNION.

As far as I know, there is no way to perform this query faster than this. (On 5m records, it's probably faster to just scan the whole table on small data-sets.)


EDIT: That answer assumes that you're trying to find all records that do not appear inside a fixed range. If you want to check every record against every other record, then you need a different approach...

Checking for every overlap is expensive. Also, if you have these four ranges, working out which to delete is impossible...

1 -->--> 4
      3 -->--> 6
            5 -->--> 8
                  7 -->--> 9

Should you delete ranges 1 and 3, or 2 and 4?

What you can do, is find all the ranges that have another range overlapping with them.

And what you don't want is to find that A overlaps with B, and B overlaps with A.

SELECT
  *
FROM
  yourTable   AS first_range
INNER JOIN
  yourTable   AS second_range
    ON  second_range.start_date >= first_range.start_date
    AND second_range.start_date <= first_range.end_date

This will necesarrily scan the whole table for first_range. But because you only check the start_date of the second range, it will be able to use a range seek on the start_date index for any collisions.

EDIT2: Or perhaps you need the opposite of the first answer?

If you want all ranges that do collide with a set range, a modification of the same approach works.

SELECT * FROM yourTable WHERE InputEnd   >= ExclusionPeriodStart 
INTERSECT
SELECT * FROM yourTable WHERE InputBegin <= ExclusionPeriodEnd

This may not, however, be great. You'll take a percentage of the table in query1, and intersect it with almost all of the rest of the table. Instead you can fall back on the simple approach, but then add an optimisation...

SELECT
  *
FROM
  yourTable
WHERE
    InputStart <= ExclusionPeriodEnd
AND InputEnd   >= ExclusionPeriodStart

The first condition in the WHERE clause can be resolved with a range seek, and then scan all the resultant records to test for the second condition. So, can we reduce the range that needs scanning (currently (start of table) -> (ExclusionPeriodEnd)).

We can if we know one extra piece of information : The maximum length of any one range...

SELECT
  *
FROM
  yourTable
WHERE
    InputStart <= ExclusionPeriodEnd
AND InputStart >= ExclusionPeriodStart - (maximumLength)
AND InputEnd   >= ExclusionPeriodStart

Now the first TWO conditions form a range seek, and give a much smaller set of data to scan for the last condition.

How do you know the max lnegth though? You could scan the whole table, but that's a self defeating attempt at an optimisation.

Instead, you could index a calcuated field; a calculation that gives the max length of the range. SELECT MAX(calculatedField) FROM yourTable then avoids scanning the whole table. Or you could keep track with a trigger. Which is fine for INSERTS, but a bit messier when you have a DELETE (If you delete the longest range, do you scan the whole table again to find the new longest range? Probably not, you could be tempted to keep the old max length instead).

MatBailie
  • 83,401
  • 18
  • 103
  • 137
  • The query could also be written without Union, as `WHERE ExclusionPeriodStart <= InputEnd AND InputBegin <= ExclusionPeriodEnd`. I don't see a reason to split into 2 subqueries and a Union. – ypercubeᵀᴹ Apr 12 '12 at 14:06
  • @ypercube - That doesn't give the same result. You *could* use `WHERE InputEnd < ExclusionPeriodStart OR InputStart > ExclusionPeriodEnd` *(assuming you mean for my first answer, before edits)*. But that gives a scan of the whole table/index. No index helps - ordering for the benefit of one predicate gives an effectively random ordering with regards to the second predicate. By separating the queries you get two range seeks. So the prupose is to yield SEEKs rather than a SCAN. – MatBailie Apr 12 '12 at 14:46
0

Assuming existing ranges don't overlap, then {beginRange} should be a (primary or alternate) key and detecting if new range would overlap with some of the existing ones could be done like this:

SELECT *
FROM YOUR_TABLE
WHERE beginRange = (
    SELECT MAX(beginRange)
    FROM YOUR_TABLE
    WHERE beginRange < :inputEnd
)
AND :inputStart < endRange
  • If the new range overlaps with some of the existing ranges, this query returns the "highest" one.
  • If there is no overlap, it returns an empty result set.

The index "under" the {beginRange} key is enough for efficiency (we only need to support the "MAX scan").

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167