2

I have this query

SELECT a.* 
FROM entries a 
INNER JOIN entries_keywords b ON a.id = b.entry_id 
INNER JOIN keywords c ON b.keyword_id = c.id 
WHERE c.key IN ('wake', 'up') 
GROUP BY a.id 
HAVING COUNT(*) = 2

but it's slow. How do I design indexes optimally to speed things up?

EDIT This is the current schema

CREATE TABLE `entries` (`id` integer PRIMARY KEY AUTOINCREMENT, `sha` text);
CREATE TABLE `entries_keywords` (`id` integer PRIMARY KEY AUTOINCREMENT, `entry_id` integer REFERENCES `entries`, `keyword_id` integer REFERENCES `keywords`);
CREATE TABLE `keywords` (`id` integer PRIMARY KEY AUTOINCREMENT, `key` string);
CREATE INDEX `entries_keywords_entry_id_index` ON `entries_keywords` (`entry_id`);
CREATE INDEX `entries_keywords_entry_id_keyword_id_index` ON `entries_keywords` (`entry_id`, `keyword_id`);
CREATE INDEX `entries_keywords_keyword_id_index` ON `entries_keywords` (`keyword_id`);
CREATE INDEX `keywords_key_index` ON `keywords` (`key`);

I'm using Sqlite3, the query doesn't fail, but is slow.

Right now I'm a query like this (subquery for each keyword):

select *
from (
    select *
    from (entries) e
    inner join entries_keywords ek on e.id = ek.entry_id
    inner join keywords k on ek.keyword_id = k.id
    where k.key = 'wake') e
inner join entries_keywords ek on e.id = ek.entry_id
inner join keywords k on ek.keyword_id = k.id
where k.key = 'up';

This is way faster but doesn't feel right since it's going to get ugly if I have a lot of keywords.

RichardTheKiwi
  • 105,798
  • 26
  • 196
  • 262
Opptatt Jobber
  • 319
  • 5
  • 13
  • what indexes do you have on the tables now? – Taryn Sep 28 '12 at 21:08
  • Indexes on the referenced columns would help. The slow part is probably the last bit.. count * are not great for performance. I'm assuming this is mysql, but if it's another database you could use a partial index on keywords.key as an optimization. – Lucas Holt Sep 28 '12 at 21:08
  • I've added some info to my question – Opptatt Jobber Sep 28 '12 at 23:55

1 Answers1

1

The key indexes required for that query

  • keywords(key)
  • entries_keywords(keyword_id,entry_id)
  • entries(id)

You must be using MySQL, because the SELECT a.* would otherwise fail.
EDIT after the 2nd comment about this statement, let me point out why select a.* will fail here - it's because of the GROUP BY.

To explain, because the criteria (WHERE) is on c.key, it needs to be indexed.
This then goes up the JOIN against b.keyword_id. We create an index to include b.entry_id so that it never has to look up against the table - the index alone can cover the columns required.
Finally, a.id=b.entry_id joins back to the entries table, so we index the id of that table.

It is quite likely entries(id) is already the primary key, but you may have entries_keywords indexed the other way around - it won't work to satisfy this join.

RichardTheKiwi
  • 105,798
  • 26
  • 196
  • 262
  • 'Fail' is a relative term, sadly. MySQL allows you to omit all sorts of columns from the GROUP BY list that the SQL Standard and the majority of other DBMS require — all the columns that are not aggregates should be listed in the GROUP BY clause normally. MySQL is more lax about this, and selects values to display to suit itself. Where the grouping column(s) form a superset of the primary key (or another candidate key) of the table, it may get the answer you wanted; but it is not deterministic if the columns you're grouping by do not form a superset of a candidate key for the table. – Jonathan Leffler Sep 28 '12 at 23:12
  • @JonathanLeffler: To be precise the SQL-2003+ standard allows the use of columns in the `SELECT` list, even if they are not in the `GROUP BY` clause. As long as they are functionally dependent on them. (not that MySQL does any checks, they have implemented this in a rather sloppy way - allowing non-standard use and semi-random results in queries.) – ypercubeᵀᴹ Sep 28 '12 at 23:49
  • @Jon there's nothing relative about 'fail' here (in context). The query won't even pass the compiler. that's as `fail` as `fail` can get – RichardTheKiwi Sep 28 '12 at 23:49
  • Have you tested that in MySQL? I have not, but I'm led to believe that it might be accepted by the SQL compiler as syntactically and sufficiently semantically correct that the statement would run. (@ypercube: SQL 2003 allows that? Interesting! Thanks.) – Jonathan Leffler Sep 28 '12 at 23:51
  • @Jonathan: [Why does MySQL add a feature that conflicts with SQL standards?](http://stackoverflow.com/questions/7594865/why-does-mysql-add-a-feature-that-conflicts-with-sql-standards) – ypercubeᵀᴹ Sep 28 '12 at 23:55
  • @Jon I said "must be MySQL, `otherwise`.. will fail". Hope that clears things – RichardTheKiwi Sep 28 '12 at 23:55
  • @Richard: Well, seems like Postgres developers are on my side. They have [implemented the feature](http://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.1#SQL_and_PL.2FPgSQL_features), too (and probably in a correct way, unlike MySQL). – ypercubeᵀᴹ Sep 28 '12 at 23:59
  • @Opp Well, have you tried a composite index of (keyword_id,entry_id) list I listed? – RichardTheKiwi Sep 29 '12 at 00:06
  • @Richard: You mean in addition to CREATE INDEX `entries_keywords_entry_id_keyword_id_index` ON `entries_keywords` (`entry_id`, `keyword_id`); or flip this one around? Is there a better way to do the query? – Opptatt Jobber Sep 29 '12 at 00:17
  • @Opp: I would keep both these indexes (what you have and what Richard suggests). See also this relevant question: [How to filter SQL results in a has-many-through relation](http://stackoverflow.com/questions/7364969/how-to-filter-sql-results-in-a-has-many-through-relation) with more than 10 ways to rewrite this kind of queries (and benchmarks). The benchmarks are for Postgres but I'd guess the join version should be ok in MySQL, too. – ypercubeᵀᴹ Sep 29 '12 at 00:30
  • @ypercube Thanks! Btw I'm using sqlite3. – Opptatt Jobber Sep 29 '12 at 00:38