I have a simple two-stage SQL query that operators on two tables A
and B
, where I use a sub-select to retrieve a number of IDs of table A
that are stored as foreign keys in B, using a (possibly complex) query on table B (and possibly other joined tables). Then, I want to simply return the first x
IDs of A
. I tried using a query like this:
SELECT sq.id
FROM (
SELECT a_id AS id, created_at
FROM B
WHERE ...
ORDER BY created_at DESC
) sq
GROUP BY sq.id
ORDER BY max(sq.created_at) DESC
LIMIT 10;
which is quite slow as Postgres seems to perform the GROUP BY
/ DISTINCT
operation on the whole result set before limiting it. If I LIMIT
the sub-query (e.g. to 100), the performance is just fine (as I'd expect), but of course it's no longer guaranteed that there will be at least 10 distinct a_id
values in the resulting rows of sq
.
Similarly, the query
SELECT a_id AS id
FROM B
WHERE ...
GROUP BY id
ORDER BY max(created_at) DESC
LIMIT 10
is quite slow as Postgres seems to perform a sequential scan on B
instead of using an (existing) index. If I remove the GROUP BY
clause it uses the index just fine.
The data in table B
is such that most rows contain different a_id
s, hence even without the GROUP BY
most of the returned IDs will be different. The goal I pursue with the grouping is to assure that the result set always contains a given number of entries from A
.
Is there a way to perform an "incremental DISTINCT
/ GROUP BY
"? In my naive thinking it would suffice for Postgres to produce result rows and group them incrementally until it reaches the number specified by LIMIT
, which in most cases should be nearly instantaneous as most a_id
values are different. I tried various ways to query the data but so far I didn't find anything that works reliably.
The Postgres version is 9.6, the data schema as follows:
Table "public.a"
Column | Type | Modifiers
--------+-------------------+------------------------------------------------
id | bigint | not null default nextval('a_id_seq'::regclass)
bar | character varying |
Indexes:
"a_pkey" PRIMARY KEY, btree (id)
"ix_a_bar" btree (bar)
Referenced by:
TABLE "b" CONSTRAINT "b_a_id_fkey" FOREIGN KEY (a_id) REFERENCES a(id)
Table "public.b"
Column | Type | Modifiers
------------+-----------------------------+--------------------------------------------------
id | bigint | not null default nextval('b_id_seq'::regclass)
foo | character varying |
a_id | bigint | not null
created_at | timestamp without time zone |
Indexes:
"b_pkey" PRIMARY KEY, btree (id)
"ix_b_created_at" btree (created_at)
"ix_b_foo" btree (foo)
Foreign-key constraints:
"b_a_id_fkey" FOREIGN KEY (a_id) REFERENCES a(id)