1

Given a table of half-open unix timestamp intervals at microsecond precision [start_time_us; stop_time_us):

CREATE TABLE interval(
    start_time_us INTEGER NOT NULL,
    stop_time_us INTEGER NOT NULL
);

I want to select intervals which intersect a given query interval [:start_time_us, :stop_time_us):

SELECT * FROM interval
WHERE start_time_us < :stop_time_us
  AND :start_time_us < stop_time_us
ORDER BY start_time_us;

Trying to make it go fast, I create these covering indices:

CREATE INDEX interval_start_idx
ON interval(start_time_us, stop_time_us);

CREATE INDEX interval_stop_idx
ON interval(stop_time_us, start_time_us);

However, if I EXPLAIN QUERY PLAN the above query, I get:

QUERY PLAN
`--SEARCH interval USING INDEX interval_start_idx (start_time_us<?)

Which, as I understand it, means that SQLite will iterate rows of the interval_start_idx index B-tree from the beginning while start_time_us < :stop_time_us, returning only rows which have :start_time_us < stop_time_us. Which will, on average, iterate half the B-tree, which is O(N), which is too slow.

On reflection it makes sense, since to make use of the second index and iterate only the required section of the B-tree, SQLite needs to know that start_time_us < stop_time_us for every row. And even if it did know this (e.g. from a CHECK constraint), it's not smart enough to figure such an optimization out.

I can translate from one index to another in a subquery:

SELECT * FROM interval
WHERE
    (
        SELECT start_time_us
        FROM interval
        WHERE :start_time_us < stop_time_us
        ORDER BY stop_time_us
        LIMIT 1
    ) <= start_time_us
    AND start_time_us < :stop_time_us
ORDER BY start_time_us;

Which produces:

QUERY PLAN
|--SEARCH interval USING INDEX idx_interval_start (start_time_us>? AND start_time_us<?)
`--SCALAR SUBQUERY 1
   `--SEARCH interval USING INDEX idx_interval_stop (stop_time_us>?)

But it's only correct for non-overlapping intervals.

How can I do this in logarithmic time?

yuri kilochek
  • 12,709
  • 2
  • 32
  • 59
  • Possible duplicate: https://stackoverflow.com/questions/6571538/checking-a-table-for-time-overlap/6571603#6571603 – Bohemian Jun 07 '23 at 23:56
  • @Bohemian "half-open" means that one (in this case - lower) bound is included, and the other (in this case - upper) bound is excluded. It does not mean that one of the bounds doesn't exist. – yuri kilochek Jun 07 '23 at 23:57
  • @Bohemian https://stackoverflow.com/questions/6571538/checking-a-table-for-time-overlap/6571603#6571603 is not a duplicate because it's not concerned with performance. – yuri kilochek Jun 08 '23 at 00:01
  • What is the earliest possible value for `start_time_us`? – Bohemian Jun 08 '23 at 00:01
  • @Bohemian Are you trying to stuff it into `rtree_i32`? No, the range won't fit at the required precision. Just assume it's `-2^63` to `2^63 - 1`. – yuri kilochek Jun 08 '23 at 00:05
  • I've had success with this sort of thing when I know the interval durations (the differences between start and stop times) are bounded. It means the start time search (or the stop time search) can be bounded. But that technique may not fit your data. – O. Jones Jun 08 '23 at 16:14
  • This does not solve the same problem, but it may have some insight: https://mysql.rjweb.org/doc.php/ipranges – Rick James Jun 28 '23 at 04:47
  • Can the intervals (in `interval`) overlap? – Rick James Jun 28 '23 at 16:44
  • @RickJames currently they can, but I the data can be transformed into non-overlapping form, so I'm willing to accept a solution with this restriction. – yuri kilochek Jun 28 '23 at 17:25

2 Answers2

0

Something I have seen work is to use the bounded operator BETWEEN, rather than the open-ended < or >, that is logically equivalent but may trick the optimizer into better use of indexes:

SELECT * FROM interval
WHERE start_time_us BETWEEN -9223372036854775808 AND :stop_time_us
  AND stop_time_us BETWEEN :start_time_us - 1 AND 9223372036854775807 
ORDER BY start_time_us

The - 1 is to cater for stop_time_us being exclusive and BETWEEN being inclusive.

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • It doesn't: `QUERY PLAN --SEARCH interval USING INDEX idx_interval_start (start_time_us>? AND start_time_us)`. And I don't see how you expect it to. – yuri kilochek Jun 08 '23 at 00:12
0

If the intervals in the table non-overlapping with each other...

Given a sorted array of intervals, find the MAX(starttime) that is < :endtime. Then check that the corresponding endtime is > :starttime

pseudo-code:

$endtime = SELECT endtime FROM tbl WHERE starttime < :endtime ORDER BY starttime LIMIT 1;
$overlaps = $endtime > :starttime;

Note: INDEX(starttime, endtime) is needed.

(Edge cases: decide whether you need < versus <= and > versus >=.)

(In MySQL the SELECT would be O(logN); I don't know about SQLite.)

If the intervals in the table can overlap with each other, then you have to do a table scan and, in a worst case, all rows might "overlap".

Rick James
  • 135,179
  • 13
  • 127
  • 222