This is a case of relational-division - with the added special requirement that the same conversation shall have no additional users.
Assuming the PK of table "conversationUsers"
is on ("userId", "conversationId")
, which enforces uniqueness of combinations, NOT NULL
and also provides the essential index for performance implicitly. Columns of the multicolumn PK in this order. Ideally, you have another index on ("conversationId", "userId")
. See:
For the basic query, there is the "brute force" approach to count the number of matching users for all conversations of all given users and then filter the ones matching all given users. OK for small tables and/or only short input arrays and/or few conversations per user, but doesn't scale well:
SELECT "conversationId"
FROM "conversationUsers" c
WHERE "userId" = ANY ('{1,4,6}'::int[])
GROUP BY 1
HAVING count(*) = array_length('{1,4,6}'::int[], 1)
AND NOT EXISTS (
SELECT FROM "conversationUsers"
WHERE "conversationId" = c."conversationId"
AND "userId" <> ALL('{1,4,6}'::int[])
);
Eliminating conversations with additional users with a NOT EXISTS
anti-semi-join. More:
Alternative techniques:
There are various other, (much) faster relational-division query techniques. But the fastest ones are not well suited for a dynamic number of user IDs.
For a fast query that can also deal with a dynamic number of user IDs, consider a recursive CTE:
WITH RECURSIVE rcte AS (
SELECT "conversationId", 1 AS idx
FROM "conversationUsers"
WHERE "userId" = ('{1,4,6}'::int[])[1]
UNION ALL
SELECT c."conversationId", r.idx + 1
FROM rcte r
JOIN "conversationUsers" c USING ("conversationId")
WHERE c."userId" = ('{1,4,6}'::int[])[idx + 1]
)
SELECT "conversationId"
FROM rcte r
WHERE idx = array_length(('{1,4,6}'::int[]), 1)
AND NOT EXISTS (
SELECT FROM "conversationUsers"
WHERE "conversationId" = r."conversationId"
AND "userId" <> ALL('{1,4,6}'::int[])
);
For ease of use wrap this in a function or prepared statement. Like:
PREPARE conversations(int[]) AS
WITH RECURSIVE rcte AS (
SELECT "conversationId", 1 AS idx
FROM "conversationUsers"
WHERE "userId" = $1[1]
UNION ALL
SELECT c."conversationId", r.idx + 1
FROM rcte r
JOIN "conversationUsers" c USING ("conversationId")
WHERE c."userId" = $1[idx + 1]
)
SELECT "conversationId"
FROM rcte r
WHERE idx = array_length($1, 1)
AND NOT EXISTS (
SELECT FROM "conversationUsers"
WHERE "conversationId" = r."conversationId"
AND "userId" <> ALL($1);
Call:
EXECUTE conversations('{1,4,6}');
db<>fiddle here (also demonstrating a function)
There is still room for improvement: to get top performance you have to put users with the fewest conversations first in your input array to eliminate as many rows as possible early. To get top performance you can generate a non-dynamic, non-recursive query dynamically (using one of the fast techniques from the first link) and execute that in turn. You could even wrap it in a single plpgsql function with dynamic SQL ...
More explanation:
Alternative: MV for sparsely written table
If the table "conversationUsers"
is mostly read-only (old conversations are unlikely to change) you might use a MATERIALIZED VIEW
with pre-aggregated users in sorted arrays and create a plain btree index on that array column.
CREATE MATERIALIZED VIEW mv_conversation_users AS
SELECT "conversationId", array_agg("userId") AS users -- sorted array
FROM (
SELECT "conversationId", "userId"
FROM "conversationUsers"
ORDER BY 1, 2
) sub
GROUP BY 1
ORDER BY 1;
CREATE INDEX ON mv_conversation_users (users) INCLUDE ("conversationId");
The demonstrated covering index requires Postgres 11. See:
About sorting rows in a subquery:
In older versions use a plain multicolumn index on (users, "conversationId")
. With very long arrays, a hash index might make sense in Postgres 10 or later.
Then the much faster query would simply be:
SELECT "conversationId"
FROM mv_conversation_users c
WHERE users = '{1,4,6}'::int[]; -- sorted array!
db<>fiddle here
You have to weigh added costs to storage, writes and maintenance against benefits to read performance.
Aside: consider legal identifiers without double quotes. conversation_id
instead of "conversationId"
etc.: