Optimize query to get unique (user) records from different tables

Question

I got this query running perfectly already, but the problem is when my 4 tables get too big, it gets quite slow.

How can I optimize this?

SELECT 
    all_records.user_id,
    users.NAME,
    users.IMAGE
FROM (
    SELECT user_id FROM comments
    WHERE commentable_id   = #{object.id}
      AND commentable_type = '#{object.class.to_s}'
    UNION ALL
    SELECT user_id FROM hello
    WHERE helloable_id     = #{object.id}
      AND helloable_type   = '#{object.class.to_s}'
    UNION ALL
    SELECT user_id FROM foo
    WHERE fooable_id       = #{object.id}
      AND fooable_type     = '#{object.class.to_s}'
    UNION ALL
    SELECT user_id FROM bar
    WHERE barable_id       = #{object.id}
      AND barable_type     = '#{object.class.to_s}'
) AS all_records
INNER JOIN users ON users.id = all_records.user_id
GROUP BY
    all_records.user_id,
    users.NAME,
    users.IMAGE
LIMIT 15

What the query should do is get the unique users that did something on the (4) tables (pardon the change of names of the tables). Even with the LIMIT 15 it still works slow because I think it still reads all the 4 tables. Am I doing this right or is there some way to optimize this?

For reference: I am using postgres and using rails but executing it in find_by_sql.

EDIT

local postgres: 9.0.5; heroku postgres: 9.1

As I responded on IRC - run explain analyze, and show the output. Preferably pasted on http://explain.depesz.com/. — , Feb 12 '13 at 11:58
This is what is displayed in the explain http://explain.depesz.com/s/3uZ3 — index, Feb 12 '13 at 12:08
First, as *always*, your version of PostgreSQL? Second, essential: do you want an *arbitrary* selection of 15 rows, a *truly random* selection or *all* rows? The first case is the cheapest by far. — Erwin Brandstetter, Feb 12 '13 at 13:27
Explain shows that tt doesn't "read all the 4 tables" - i.e. it doesn't do seq scan on them. It does read data from all 4 tables, using index scan. If you don't need data from all 4 tables, why did you include them in query? Also - explain **analyze** is the key, and version information could help too, in some cases. — , Feb 12 '13 at 14:45
@mvp: Sorry about that, here is an `explain analyze` - http://explain.depesz.com/s/C9K — index, Feb 13 '13 at 02:29
@ErwinBrandstetter: Updated question with postgres version. Regarding the selection, I think arbitrary is okay. — index, Feb 13 '13 at 02:31
@depesz: I just read about `explain analyze`, didn't know there is one like that. Thanks! As for the 4 tables, I just need the users that has done something on that certain object, that's why I need to check if they have a record there. — index, Feb 13 '13 at 02:34

score 1 · Accepted Answer · edited May 23 '17 at 12:02

Taking your question as it is: "get 15 arbitrary rows". That should be very fast.

SELECT u.id, u.name, u.image
FROM  (
   SELECT id
   FROM  (
      SELECT user_id AS id
      FROM   comments
      WHERE  commentable_id   = #{object.id}
      AND    commentable_type = '#{object.class.to_s}'

      UNION ALL
      SELECT user_id
      FROM   hello
      WHERE  helloable_id   = #{object.id}
      AND    helloable_type = '#{object.class.to_s}'

      UNION ALL
      SELECT user_id
      FROM   foo
      WHERE  fooable_id     = #{object.id}
      AND    fooable_type   = '#{object.class.to_s}'

      UNION ALL
      SELECT user_id
      FROM   bar
      WHERE  barable_id     = #{object.id}
      AND    barable_type   = '#{object.class.to_s}'
      ) AS a
   GROUP  BY id
   LIMIT  15
   ) b
JOIN   users u USING (id)

If you are running PostgreSQL 9.1 or later, you could simplify to GROUP BY id, assuming users.id is the primary key. But I take a more radical approach.
I pull up the GROUP BY and LIMIT one query level in the hope to enable faster index scans on the base tables. With a LIMIT 15 and no ORDER BY sequential scans should not occur. Postgres can just read tuples from the top of the index and stop as soon as the limit is reached.
Similar to this this closely related case: Way to try multiple SELECTs till a result is available?
Only here Postgres reads tuples from the index.
You might achieve the same effect by using LEFT JOIN users instead of JOIN (instead of my extra subquery level), since the JOIN can potentially drop rows from the result and disables a simpler query plan.
For perfect performance, you have indexes like
```
CREATE INDEX comments_mult_idx
ON comments (commentable_id, commentable_type, user_id)
```
on all 4 tables. user_id has to be the last column. Here's why.

Awesome! Thanks. So I did an `explain analyze` on the this query and mine with a same object and I get `Total runtime: 12.493 ms` from mine and `Total runtime: 1.663 ms` from yours. Is it okay to use this `Total runtime` from `explain analyze` to determine that speed of a query? — index, Feb 13 '13 at 03:07
@index: Yes, `EXPLAIN ANALYZE` gives you the real time Postgres takes (as opposed to just EXPLAIN which only shows the planners estimates) for your query (not including network overhead). You may want to run it a couple of times to saturate the cache. Second or third run are usually a bit faster. [Read the manual for details!](http://www.postgresql.org/docs/current/interactive/sql-explain.html) — Erwin Brandstetter, Feb 13 '13 at 08:55
Thanks! This work perfectly. :) Can you also explain what I did that made the query so slow? Was it because of the `join`? — index, Feb 15 '13 at 03:17
This query enables a simpler plan (like I tried to explain above and in the linked answer). Comparing the output of `EXPLAIN ANALYZE` for both should give more details. — Erwin Brandstetter, Feb 15 '13 at 05:56

Optimize query to get unique (user) records from different tables

1 Answers1