1

(Probably a duplicate but I can only find questions and solutions with the JOIN [3] and that's not an option.)

I have two tables. Both very thin (few columns) and very long (many rows). One is the data table (articles) and one is the ACL table (acl).

I want to show only the articles I have access to via acl.some_id. Which subquery is faster?

[1]
SELECT a.title
FROM articles a
WHERE 0 < (
  SELECT COUNT(1)
  FROM acl
  WHERE article_id = a.id AND some_id IN (1, 2, 3)
)

or

[2]
SELECT a.title
FROM articles a
WHERE a.id IN (
  SELECT article_id
  FROM acl WHERE some_id IN (1, 2, 3)
)

My mind would say the second one, because that subquery can be reused for all potentially matching row, so will only be executed once (although the result set will be very large), while the subquery in the first will have to check for EVERY potentially matching row.

There's a third way, but that's not an option, because it would duplicate rows (and GROUP BY is not the solution because I need a COUNT for something else later (and DISTINCT is never a solution!)):

[3]
SELECT a.title
FROM articles a
JOIN acl
  ON acl.article_id = a.id
WHERE acl.some_id IN (1, 2, 3)

Since article_id X exists N times in acl, it would return that row 0 - N times instead of 0 - 1.

There's also a fourth way: EXISTS. Thanks to ypercube.

Related:

Community
  • 1
  • 1
Rudie
  • 52,220
  • 42
  • 131
  • 173
  • 5
    You wrote the code... **TRY IT AND FIND OUT!** – Sam Axe Mar 20 '13 at 22:20
  • Trying it once or 15 times doesn't really mean anything. I want the why. And the tables aren't long yet =) so the execution time will be very, very short. – Rudie Mar 20 '13 at 22:21
  • 3
    mysql has all the tools you need to answer this yourself. Look at the execution plan for each query. Review the execution times. You can do this. – Sam Axe Mar 20 '13 at 22:24
  • 1
    Trying it 1000 times and taking the min, max, and average execution times does mean something. In fact, if I tried them both 15 times and got the same result all 15 times, I'd be convinced. – Dan Bracuk Mar 20 '13 at 22:26

1 Answers1

5

I would say [2], too, but MySQL has some blind spots in optimizing IN subqueries, at least up to 5.5. There are several improvements of the query optimizer in the (newly released) 5.6 version. You can read about (semijoins and IN subqueries) in the MySQL docs: MySQL 5.6: Optimizing Subqueries with Semi-Join Transformations.

There are also several improvements of the optimizer in MariaDB (versions 5.3 and 5.5), and some are related to this kind of queries. You can read about in their docs: MariaDB 5.3: Semi-join subquery optimizations.

You can also try the EXISTS version, especially if you are using 5.5 or older version:

-- [4]
SELECT id
FROM articles AS a
WHERE EXISTS (
  SELECT *
  FROM acl 
  WHERE acl.some_id IN (1, 2, 3)
    AND acl.article_id = a.id 
) ;

I think an index on (article_id, some_id) will be useful here - or perhaps the reverse one, it doesn't hurt to try both.


If there is foreign key from acl (article_id) REFERENCES article (id) that you can trust, and you only need the article IDs, you can also get data from one table only:

SELECT DISTINCT article_id
FROM acl 
WHERE acl.some_id IN (1, 2, 3) ;

Of course you should test the several versions you have in your server, the version of MySQL you have (or plan to use), with your data distribution, and of course with large enough sizes of tables. Testing with a few hundred rows will not tell you much.

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • Ah yes EXISTS, that's an option too. Sounds like it was made to do just that. (I have those indexes btw.) – Rudie Mar 20 '13 at 22:29
  • EXISTS it is, I think. Much faster than IN + COUNT apparently: http://www.jortk.nl/2008/07/exists-much-faster-then-in-in-mysql/ (although that's very old). Probably because EXISTS stops/returns after 1 record found. – Rudie Mar 20 '13 at 22:31
  • The last query (only returning article IDs) isn't sufficient, because I want much more than the article ID. Sorry. Not obvious enough. – Rudie Mar 20 '13 at 22:36
  • @Rudie A smart query planner would stop the "count" after 1 as well because it satisfies the outer condition .. head the closing paragraph in this reply. That being said, using `EXISTS` also exposes the semantics more cleanly. –  Mar 20 '13 at 22:39
  • @pst How would I 'stop' the count after 1..? (Even if I can do that, I think MySQL is better at it, so EXISTS must be more efficient.) – Rudie Mar 20 '13 at 22:41
  • Check also the updated answer and the links to the latest (and not so latest) optimizer improvements. – ypercubeᵀᴹ Mar 20 '13 at 22:47
  • That's somewhat of a coincidence: MySQL addresses exactly my problem in its latest version. I don't have MySQL 5.6.5 or MariaDB so I can't benchmark against EXISTS, but I'll remember this. Thanks for the very useful links. – Rudie Mar 20 '13 at 22:58