1

I have two tables in a this is Postgres database representing simple orders from a market. A master table with information about the order, and a detail table with containing specifics of the purchase, with a foreign key back to master. Easy enough.

Over thousands of orders from the market, I'd like to find some specific orders based on what was purchased and in what quantity.

I have two more tables, in similar fashion, a master and a child where I create a "pack" and detail items from the market.

For example: Pack A contains 2 Apples and 3 Oranges. I define that in the tables. Now I'd like to find how many orders, and which orders from the market match that specific combination exactly.

It's important that it is an exact match. An order containing additional products or with any different quantity does not match.

In the SQL Fiddle, I've setup the simple example with data. The raw DDL is below. Two of the orders in the table should match Pack A.

http://sqlfiddle.com/#!17/b4f55

CREATE TABLE customer_order(
 order_id serial PRIMARY KEY NOT NULL,
 customer_name VARCHAR(100) NOT NULL
);

CREATE TABLE order_detail(
    id serial PRIMARY KEY,
    order_id INTEGER,
    item_sku VARCHAR(50),
    item_quantity INTEGER,
    FOREIGN KEY(order_id) REFERENCES customer_order(order_id)
);

INSERT INTO customer_order (customer_name) VALUES ('John');
INSERT INTO customer_order (customer_name) VALUES ('Mary');
INSERT INTO customer_order (customer_name) VALUES ('Bill');

INSERT INTO order_detail (order_id, item_sku, item_quantity) VALUES (1, 'APPLE', 2);
INSERT INTO order_detail (order_id, item_sku, item_quantity) VALUES (1, 'ORANGE', 3);
INSERT INTO order_detail (order_id, item_sku, item_quantity) VALUES (2, 'ORANGE', 5);
INSERT INTO order_detail (order_id, item_sku, item_quantity) VALUES (3, 'APPLE', 2);
INSERT INTO order_detail (order_id, item_sku, item_quantity) VALUES (3, 'ORANGE', 3);

CREATE TABLE pack_master(
 pack_id serial PRIMARY KEY NOT NULL,
 name VARCHAR(100) NOT NULL
);

CREATE TABLE pack_child(
    id serial PRIMARY KEY,
    pack_id INTEGER,
    item_sku VARCHAR(50),
    item_quantity INTEGER,
    FOREIGN KEY(pack_id) REFERENCES pack_master(pack_id)
);

INSERT INTO pack_master (name) VALUES ('Pack A');
INSERT INTO pack_master (name) VALUES ('Pack B');

INSERT INTO pack_child (pack_id, item_sku, item_quantity) VALUES (1, 'APPLE', 2);
INSERT INTO pack_child (pack_id, item_sku, item_quantity) VALUES (1, 'ORANGE', 3);
INSERT INTO pack_child (pack_id, item_sku, item_quantity) VALUES (2, 'GRAPES', 5);
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
G. Ball
  • 379
  • 1
  • 3
  • 9

2 Answers2

2

Assuming that pack_child (pack_id, item_sku), as well as order_detail (order_id, item_sku) are defined UNIQUE, this would work:

SELECT pc.pack_id, od.order_id
FROM   pack_child pc
LEFT   JOIN order_detail od USING (item_sku, item_quantity)
GROUP  BY 1, 2
HAVING count(*) = count(od.id)  -- every item of the pack has a match
AND    NOT EXISTS (
   SELECT
   FROM   order_detail od1
   LEFT   JOIN pack_child pc1 ON pc1.item_sku = od1.item_sku
                             AND pc1.item_quantity = od1.item_quantity
                             AND pc1.pack_id = pc.pack_id
   WHERE  od1.order_id = od.order_id
   AND    pc1.id IS NULL       -- and order has no additional item
   );

Returns all pairs of pack_id and order_id that match exactly.

db<>fiddle here

There are a hundred-and-one alternative ways to write the query. Which is the fastest depends on cardinalities, data distribution, constraints and, most importantly, available indexes.

It's a special application of . Here is an arsenal of techniques:

One alternative, probably faster: create views or materialized views of the parent tables including the item count:

CREATE MATERIALIZED VIEW v_pack_master AS
SELECT *
FROM   pack_master
JOIN  (
   SELECT pack_id, count(*) AS items
   FROM   pack_child
   GROUP  BY 1
   ) c USING (pack_id);

CREATE MATERIALIZED VIEW v_customer_order AS
SELECT *
FROM   customer_order
JOIN  (
   SELECT order_id, count(*) AS items
   FROM   order_detail
   GROUP  BY 1
   ) c USING (order_id);

(Orders typically don't change later, so might be viable candidates for a materialized view.)

Only if there can be many order items, an index might pay (index expressions in this order):

CREATE INDEX foo ON v_customer_order (items, order_id);

The query now only considers orders with a matching item count to begin with:

SELECT * -- pack_id, order_id
FROM   v_pack_master pm
LEFT   JOIN v_customer_order co USING (items)
JOIN   LATERAL (
   SELECT count(*) AS items
   FROM   pack_child pc
   JOIN   order_detail od USING (item_sku, item_quantity)
   WHERE  pc.pack_id  = pm.pack_id
   AND    od.order_id = co.order_id
   ) x USING (items);

.. then, if all items match, we don't have to rule out additional items any more. And we have all columns from the parent table at our disposal right away, to return whatever it is you want to return ...

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • This is really great! I'm working on trying it out on my real db now to see how it looks, but so far this seems to be the answer I was looking for. I'm curious though, if there would eventually be a way to dynamically "detect" the pack combinations from within the existing order data. So instead of having to define the pack ahead of time, if I could say "if an order is duplicated more than 10 times, go ahead and consider that a pack and find all orders it matches". Thats beyond the scope of this question or course, but it's my next thought. – G. Ball Dec 21 '18 at 17:55
  • @G.Ball: A query could be devised in similar fashion to identify all "packs" of > 9 identical orders. Could be called manually or scheduled with crontab or similar or triggered on insert/update/delete, but the last option has a lot of overhead and possible complications. Ask a new question with precise requirements if you want to explore this in detail. – Erwin Brandstetter Dec 21 '18 at 19:36
0

I'd like to find how many orders, and which orders from the market match that specific combination.

From this I would assume that since you have 2 apples orders with the quantity of 2 and 2 orange orders with a quantity of 2 your result should be something like the table below since they exists in the pack with the same item_sku and quantity.

 item_sku  | Count
   --------+------
    Apple  | 2
    Orange | 2

SQL:

SELECT OD.item_sku, count(OD.item_sku)
FROM order_detail as OD
JOIN pack_child as PC
ON OD.item_sku = PC.item_sku
WHERE OD.item_sku = PC.item_sku AND (OD.item_quantity = PC.item_quantity)
GROUP BY OD.item_sku
comphonia
  • 521
  • 3
  • 10