I have a query involving couples of rows which have a less-than-2-hours time-difference (~0.08333 days):
SELECT mt1.*, mt2.* FROM mytable mt1, mytable mt2
WHERE ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) < 0.08333
This query is rather slow, i.e. ~ 1 second (the table has ~ 10k rows).
An idea was to use an INDEX
. Obviously CREATE INDEX id1 ON mytable(date)
didn't improve anything, that's normal.
Then I noticed that the magical query CREATE INDEX id2 ON mytable(JULIANDAY(date))
didn't help when using:
... WHERE ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) < 0.08333
didn't help when using:
... WHERE JULIANDAY(mt2.date) - 0.08333 < JULIANDAY(mt1.date) < JULIANDAY(mt2.date) + 0.08333
... but massively improved the performance (query time happily divided by 50 !) when using:
... WHERE JULIANDAY(mt1.date) < JULIANDAY(mt2.date) + 0.08333 AND JULIANDAY(mt1.date) > JULIANDAY(mt2.date) - 0.08333
Of course 1., 2. and 3. are equivalent since mathematically,
|x-y| < 0.08333 <=> y - 0.08333 < x < y + 0.08333
<=> x < y + 0.08333 AND x > y - 0.08333
Question: Why are solutions 1. and 2. not making use of INDEX whereas solution 3. is using it?
Note:
I'm using Python + Sqlite
sqlite3
moduleThe fact solutions 1. and 2. are not using the index is confirmed when doing
EXPLAIN QUERY PLAN SELECT ...
:(0, 0, 0, u'SCAN TABLE mytable AS mt1') (0, 1, 1, u'SCAN TABLE mytable AS mt2')
The fact solution 3. is using the index is shown when doing
EXPLAIN QUERY PLAN SELECT ...
:(0, 0, 1, u'SCAN TABLE mytable AS mt2') (0, 1, 0, u'SEARCH TABLE mytable AS mt1 USING INDEX id2 (<expr>>? AND <expr><?)')