Orders with only these products (GROUP BY and HAVING)

Question

I would like to select orders with only the products with the IDs 1, 2, 3, is it possible:

Here is my actual SQL query:

SELECT COUNT(*) FROM "orders"
INNER JOIN "line_items" ON "line_items"."order_id" = "orders"."id"
INNER JOIN "products" ON "products"."id" = "line_items"."product_id"
WHERE "products"."id" IN (1, 2, 3)
GROUP BY orders.id
HAVING (only line items with these products ids ?)

Any idea?

For example:

Order with products 1, 2       => OK
Order with products 1, 3       => OK
Order with products 1, 2, 3    => OK
Order with products 1, 4       => NOT OK
Order with products 1, 2, 3, 4 => NOT OK

By `only the products with the IDs 1, 2, 3` you mean order must have [1,2,3] all together or either one of them? — Kirti Thorat, Apr 07 '14 at 19:19
`I would like to select orders` - but then you show a query with just a count and no orders at all. Different result, different query. Please clarify. Also: is `(order_id, product_id)` defined `UNIQUE` in your table `line_items`? The table definition would tell us more. — Erwin Brandstetter, Apr 07 '14 at 21:30

score 1 · Answer 1 · answered Apr 07 '14 at 19:49

1

I ended up using array_agg from PostgreSQL and the <@ contains function:

SELECT orders.id FROM "orders"
INNER JOIN "line_items" ON "line_items"."order_id" = "orders"."id"
GROUP BY orders.id
HAVING array_agg(line_items.product_id) <@ ARRAY[1, 2, 3];

answered Apr 07 '14 at 19:49

Dorian

22,759
8
120
116

score 1 · Answer 2 · edited May 23 '17 at 10:30

While your presented solution works (assuming the UNIQUE constraint you kept a secret), it will be painfully slow with bigger tables. It cannot use indexes, so the whole table line_items has to be aggregated before Postgres can apply the filter.

Use instead:

SELECT o.*
FROM  (
   SELECT DISTINCT i.order_id
   FROM   line_items i
   WHERE  i.product_id IN (1,2,3)
   AND    NOT EXISTS (
      SELECT 1 FROM line_items
      WHERE  order_id = i.order_id
      AND    product_id NOT IN (1,2,3)
      )
   ) i
JOIN   orders o ON o.id = i.order_id;

This can use indexes and will typically be faster by orders of magnitude (difference is growing with the size of the table). Normally, product_id, order_id would be integer columns and these two multicolumn indexes would be perfect:

CREATE INDEX foo1_idx ON line_items (product_id, order_id);
CREATE INDEX foo2_idx ON line_items (order_id, product_id);

One of them might be the primary key already, then you only need to add the other. Assuming orders.id is indexed as well. All of this should be in your question.

Why these indexes? Detailed explanation in this related answer on dba.SE:
Is a composite index also good for queries on the first field?

This is a case of relational division. We assembled an arsenal of techniques under this question:
How to filter SQL results in a has-many-through relation

The special difficulty here is to allow various combinations. The result is easier to define by what is not allowed.

Aside: do not needlessly double-quote legal, lower-case identifiers. Makes the code noisy and hard to read.

score 0 · Answer 3 · edited Apr 07 '14 at 19:34

0

Add a WHERE clause like this:

SELECT COUNT(*) FROM "orders"
INNER JOIN "line_items" ON "line_items"."order_id" = "orders"."id"
INNER JOIN "products" ON "products"."id" = "line_items"."product_id"
WHERE "products"."id" IN (1, 2, 3)
      AND "line_items"."product_id" IN (1,2,3)
GROUP BY orders.id

edited Apr 07 '14 at 19:34

Dorian

22,759
8
120
116

answered Apr 07 '14 at 19:06

Jayesh Chandrapal

668
7
11

2

Remove the second WHERE and replace it with AND :-) – Evan Volgas Apr 07 '14 at 19:07
3

I think `"products"."id" IN (1, 2, 3)` and `"line_items"."product_id" IN (1,2,3)` are equivalent so that doesn't change anything. – Dorian Apr 07 '14 at 19:33

Orders with only these products (GROUP BY and HAVING)

3 Answers3