1

Say I have three tables, a table of users, a table of around 500 different items, and the corresponding join table. What I would like to do is:

select * from users u join items_users iu on iu.user_id = u.id
where iu.item_id in (1,2,3,4,5)
and u.city_id = 1 limit 10;

Except, instead of an IN condition, I would like to find users that have all the corresponding items. If it helps, assume that the max number of items that will be searched for at a time will be 5. Also, I am using Postgres, and don't mind denormalizing it if would help as it's a read only system and speed is highest priority.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
me-
  • 67
  • 7

1 Answers1

0

It's another case of relational division. We have assembled quite an arsenal of queries to deal with this class of problems here.

In this case, with 5 or more items, I might try:

SELECT u.*
FROM   users AS u
WHERE  u.city_id = 1
AND EXISTS (
   SELECT *
   FROM   items_users AS a
   JOIN   items_users AS b USING (user_id)
   JOIN   items_users AS c USING (user_id)
   ...
   WHERE  a.user_id = u.user_id
   AND    a.item_id = 1
   AND    b.item_id = 2
   AND    c.item_id = 3
   ...
   )
LIMIT 10;

It was among the fastest in my tests and it fits the requirement of multiple criteria on items_users while only returning columns from user.

Read about indexes at the linked answer. these are crucial for performance. As your tables are read-only I would also CLUSTER both tables, to minimize the number of pages that have to be visited. If nothing else, CLUSTER items_users using a multi-column index on (user_id, item_id).

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228