Why this subquery is 10x faster than inner join

Question

I have two tables ecdict and favourite and one view which just inner join two tables.

CREATE TABLE ecdict('_id' INTEGER PRIMARY KEY, 'word' VARCHAR NOT NULL UNIQUE, 'meaning' VARCHAR, 'phonetic_string' VARCHAR, 'example' VARCHAR);
CREATE TABLE favourite('_id' INTEGER PRIMARY KEY, 'word' VARCHAR NOT NULL UNIQUE);

CREATE VIEW favourite_word as select ecdict.* from ecdict INNER JOIN favourite ON ecdict.word = favourite.word;

Here is the execution time of result:
Time: 0.010

select ecdict.* from ecdict where word in (select word from favourite);

Time: 0.226

select * from favourite_word;

Why they differ so much? It if is related to query plan, why sqlite chooses a slower one? How can I guide sqlite to choose the faster one? Thanks

score 1 · Accepted Answer · answered Nov 30 '14 at 13:30

1

You need to look at the query plan to see what is really happening. The following is (informed) speculation.

When you have an inner join with two sets of indexes, you basically have two possible query plans:

Scan the edict table and look up words in the favourites using the index on favourite.word.
Scan the favourite table and look up words in edicts using the index on edicts(word). Then look up the record in edit when there is a match.

(A unique contraint creates an index.)

All these being equal -- that is, with no information about table sizes -- the first approach would be preferred in this case. Why? Because there is no additional step to look up the additional columns.

When you phrase the query using in, I think that SQLite will always use the first approach. In this case, I am guessing that the favourite table is much smaller than the edict table, so the second approach is actually better. When you write the query as a join, both possibilities are considered and the better one is chosen.

answered Nov 30 '14 at 13:30

Gordon Linoff

1,242,037
58
646
786

Yes, favourite is much smaller. But won't sqlite choose the faster one for me? – Bear Nov 30 '14 at 13:43
@Bear . . . It will for the explicit `join`. But the `in` would focus on the index on `favourite`. It *could* consider the other direction, but that is really a very special case because the two columns are both declared unique. – Gordon Linoff Nov 30 '14 at 13:59
Thanks so much. But if explict join will find a better one for me, it should choose the second approach – Bear Nov 30 '14 at 14:09
@Bear . . . The explicit `join` does appear to choose the second option. – Gordon Linoff Nov 30 '14 at 14:23
Thanks for your help. But I really want to know about this more. For example, why it would choose the slower one and how can I make sqlite choose the faster one. I have update the question for this. – Bear Dec 01 '14 at 02:47
@Bear . . . Use the `join` syntax. It has more optimization methods. – Gordon Linoff Dec 01 '14 at 02:49
from my understanding, join is basically the same as inner join. reference: http://stackoverflow.com/questions/565620/difference-between-join-and-inner-join – Bear Dec 01 '14 at 15:53
@Bear . . . A `join` *is* the same. The issue is the `in`, not the `join` -- the `join` is producing the better plan. SQLite apparently doesn't look at every conceivable optimization path that the `join` does. And, this is a very special case because of the unique indexes in both tables. – Gordon Linoff Dec 01 '14 at 16:45

Why this subquery is 10x faster than inner join

1 Answers1